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© A two-step process for characterization of peaks 
in a chromatogram is disclosed. In a first step, data 
corresponding to each peak or each pair of peaks in 
the chromatogram is identified. A unique filter ap- 
paratus locates extrema of the curvature of the 
chromatographic data and a data file is generated 
containing characteristics of the extrema. A pattern 
recognition apparatus analyzes the characteristics of 
the located extrema and classifies the peak or peak 
combination represented by the data in the file as 
one peak or peak combination in a set of resolved 
peaks and selected combinations of resolved peaks. 
A portion of the chromatographic data, which cor- 
responds to the peak or peak combination identified 
by the pattern recognition apparatus, is identified. 
This portion of the data includes both the signal for 
said peak and the signal for the baseline upon which 
the peak is superimposed. 
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In the second step, oata for a peak or a peak 
combination identified as described above, or in the 
alternative, identified by some other process, is pro- 
cessed and a set of characterizing parameters for 
the peak or the first peak in the peak combination is 
generated without a prior baseline correction to the 
data. The peak data including the baseline level 
upon which the peak is superimposed is analyzed 
using one of lookup tables, neural nets, curve fitting, 
or combinations of lookup tables, neural nets and 
curve fitting. Each of these characterization pro- 
cesses, using information about the peak crest and 
the peak inflection points, determines a set of char- 
acterizing parameters and a baseline estimate that 
best fit the identified data. Thus, the peak char- 
acterization according to the principles of this inven- 
tion is not biased by a prior baseline correction. 
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said input unit means. 

123. The system of Claim 122 wherein said hidden unit output signal (5) comonscs a weighted function 
of said plurality cf input unit means output signals. 

124. The system of Claim 123 whereir said output unit output signal comprises a weighted funct on of 
said plurality of hidden unit means outout s:gnals. 
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95. The method of Claim 94. said step of comparing said calculated attribute further comprising 
comparing a measure of said calculated attribute from each filter with (i) prior measures cf said cElcUatec 
attribute from that filter and (ii) prior measures cf said calculated attribute *rom other filter? :o identify tne 
measure of said calculated attribute best estimating one of said characteristics of saia attribute wnereir said 

b other filters have a specified relationship to that filter 

96. The method of Claim 95, wherein said attribute comprises a time derivative 

97. The method of Claim 96, wherein said time derivative comprises a seconc time derivative 

98. The method of Claim 97, wherein said identifiable characteristics of saia attribute comprise extrema 
of said second time derivative. 

to 99. The method of Claim 98. said step of using pattern recognition means further comprising: 

comparing said stored extrema with extrema of the second time derivative for five groups of peaks including 
(i) resolved peaks, (ii) slightly fused peaks of the same sign, (iii) slightly fused peaks of opposite sign. Ov) 
strongly fused peaks of tne same sign and (v) strongly fused peaks of opposite sign thereoy classifying 
said stored extrema as representing one of said five groups of peaks. 

is 100. The method of Claim 94 wherein said series of filters comprises a series of digital filters 

101. A system for classifying a peak in measured signal data as one of the peaks m a group consisting 
of resolved peaks and combinations of resolved peaks, wherein each of said resolved oeaks and 
combinations of resolved peaks in said yruup has an attribute which in turn has characteristics and said 
characteristics of said attribute of each of said resolved peaks and each of said combinations of resolved 

20 peaks are unique, comonsing: 

means (154), operatively coupleo to said measured signal data, for generating a time series of data 
bunches corresponding to said measured data; 

means (153), operatively coupled to said data bunch generating means, for sequentially calculating said 
attribute for each data bunch in said series of data bunches; 
25 means (153). operatively coupled tc said calculating means, for comparing said calculated attribute to 
previously calculated attributes to identify one of said characteristics of said attributes wherein said 
comparison means stores each identified characteristic: and 

pattern recognition means (153), operatively coupled to said comparison means, for classifying said stored 
characteristics of sa:d attribute thereby classifying said peak; 
30 means (153). operatively coupled to said pattern recognition means and to said analysis means, for 
selecting a set of data bunches from said first-mentioned data bunches wherein said selection means 
defines a 'time range for said set of data bunches based upon said peak classification and said stored 
characteristics of said attribute: and 

means (153), operatively coupled to said selection means, for generating a set of parameters, which 
35 characterize a peak, and a baseline for a peak wherein said generating means characterizes said set of data 
bunches without a prior baseline correction. 

102. The system of Claim 101, said calculating means further comprising: 

a plurality of filter means, each filter means calculating said selected attribute wherein: 
each filter means has a characteristic time width; and 
40 said characteristic time widths of adjacent filters differ by an integer multiple. 

103. The system as in Claim 102, said analysts means further comprising: 

means for comparing a measure of said calculated attribute from each filter with (i) prior measures of said 
calculated attribute from that filter and (ii) prior measures of said calculated attribute from otner filters to 
identify the measure of said calculated attribute oest estimating one of said characteristics of said attribute. 
45 said other filters having a specified relationship to that filter. 

104. The system of Claim 103, wherein said attribute comprises a time derivative. 

105. The system of Claim 104, wneretn said time derivative comprises a second time derivative. 

106. The system of Claim 105, wherein said identifiable characteristics of said attribute comprise 
extrema of said second time derivative. 

107 The svsten of Claim 106. said pattern recognition means fur*he r comprising: 



.... jJi.:. ... '■■-!- ■ ■ - ' a--.. ,< ' ' ..... - ; " 

\-.e?eoy classifying saic extrema r; bdio ud:d a:. > esentiMi. c~e c* "*c g-^ Ji:~. c: cta^i. 

108 Tne system of Claim 102 wherein saic plurality of filter means comprises a oiu-ality of digital 
filters. 

109 A method for determining characteristics of ar attribute of data, said cata including data bunches 
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<i) selecting a second characteristic interval, said second cnaractanstic interval oe ng a multiple of 
said first mentioned characteristic interval; 

(id fcrmmg a second data set from said first-mentioned data wherein each data bunch in saic second 
data set has said second characteristic interval 
t (in) calculating said attribute for each data bunch ir said data using a filter wherein said filter has a 

characteristic interval corresponding to said first characteristic interval; 

(iv) calculating said attribute for each data point in said second set of data using another filter wherein 
sa d second filter has a characteristic interval corresponding to said second characteristic interval; 

<v) sequencing steps (in) and (iv) to maintain alignment oetween a portion o* said first and second 
id filters, and 

(vi) comparing, for each data bunch, a measure of saic calculated attribute for that data bunch with a 
stored measure, said stored measure being a previously calculated measure, best estimating a characteris- 
tic of said attribute, wherein said comparison identifies one of the measures as (i) the measure best 
estimating a characteristic of said attribute and (ii) the measure corresponding to a characteristic of said 
75 attribute 

110. The method of Claim 109 wherein said attribute comprises a time derivative (G ). 
111 The method of Claim 110 wherein said time derivative (G ) comprises a second time derivative. 
112. The method of Ciaim i ii wherein said characteristics of said attribute comprise extreme cf said 
second time derivative. 

20 113 The metnod of Claim 109 wherein said filters comprise digital filters, each digital filter having a 

leading edge 

114 The method cf Claim 113 wherein step (v) maintains alignment of the leading edges of said digital 

filters 

115 A system for determining characteristics of an attribute of data, said data including data bunches 
25 with nach data bunch having a characteristic interval, comprising: 

means ( 1 S4 153). operatively coupled to said data, for forming data bunches (161) having a second 
characteristic interval, said second characteristic interval being an integer multiple of said first-mentioned 
characteristic interval; 

first filter means (153). operatively coupled to said data, for generating said attribute of said data in 
30 response to a plurality of said data bunches, wherein said first filter means has a characteristic interval 
corresponcing to said first characteristic interval; 

second filter means, operatively coupled to said data forming means, for generating said attribute of said 
data in response to a plurality of data bunches having said second characteristic interval, wherein said 
second filter means has a characteristic interval corresponding to said second characteristic interval; and 
35 means, operatively connected to said first and second filter means, for sequencing operation of said filter 
means wherein said sequencing maintains alignment between said first and satd second filter means with 
respect to the data bunches being processed by said filter means. 

116 The system of Claim 115 wherein said attribute comprises a time derivative (G ). 

117. The system of Claim 116 wherein said time derivative comprises of a second time derivative (G ). 
so 118 The system of Claim 117 wherein said characteristics of said attribute comprise extrema of second 

time derivative. 

119 The system of Claim 115 wherein said filters comprise digital fitters, each digital filter having a 
leading edge. 

120 The system of Claim 119 wherein said sequencing means maintains alignment of said leading 
-5 edges cf satd digital filters. 

121. A system for characterizing a peak superimposed on a baseline comprising; 
source of data signals (X) representing said peak superimposed on said oasehne; 

inout unit means (201), operatively coupled to said source of data signals, each input unit means generating 
an output signal m response to an input signal; 

hidden unit means (202). ooerativelv coupled to said input unit means, each hidden unit means generating 



eacn output s-gna irom ar outpu: t means represents a cesi estimate o a fjdicameie; „r *a- astc; ^ 
th peak. 

122. Tne system of Claim 121 further comprising: 
derivative means, ooerarve'y couDled to said source of data signals and to said input unit means, for 
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second inflection point at a second point in time, ana a peak crest occurring at a third point in time, saic 
third point in time being between said first anc second points in time wnerein said genera! shape peaK 
ccmpnses a convolution integral of an exponentially modifiec gaussian peak and another peak. 

71 Tne method as in Claim 70, wherein said first characteristic width comprises a difference m time 
betweer said first and said seconc points in time 

72 The method as in Claim 70, wherein said second characteristic width comprises a time difference 
betweer said third point in time and saia second point in time. 

73 A method fo- generating pattern recognition means (95) for an attribute of measured data 
comprising the s:eDS of: 

(n determining specified data types observed in said measured data, 

(n) characterizing each of said specified data types in saia measured data; 

(tn) determining a number N of combinations of said specified data types requirec tc reproduce a!i 
combinations of said specified data types observed in said measured data; 

(iv) determining said attribute for each of said specified data types and for each of said N 
combinations of said data types: and 

<m determining means for uniquely identifying characteristics cf each of said attributes for said 
specified cata types and for said N combinations of said data types wherein saic means comprise said 

nsMft'n tor- ^nnitirtn mflanc ir\r caiH attr-iKiito r\t caiH mnaoi iroH HVi 

74 The method of Claim 73 wnerein said attribute comprises a derivative (G ). 

75 The methoc of Claim 74 wherein said derivative comprises a second derivative (G ) with respect to 

time 

76 Th^ netnod of Claim 73 wherein said determining means step furtner comprises determining an 
empnrai s^t of rules for uniquely identifying said attributes for each of said specified data types and for 
said N rorr.nmatons of said specified data types. 

77 A method for classifying a peak in measurec data as one of the peaks in a group consisting of 
resolvec peaks and combinations of resolved peaks, wherein each of said resolved peaKs and combinations 
of resolved peaks in said group has an attribute which in turn has characteristics and said characteristics of 
said attribute of each of said resolved peaks and each of said combinations of resolved peaks are unique, 
compris ng tne steps of: 

(i generating a time series of data bunches corresponding to said measured data; 
(ii) calculating sequentially said attribute for each data bunch in said series of data bunches; 
(hi) comparing said calculatec attribute to previously calculated attributes to identify one of said 
characterises of said attributes; 

(iv) storing said characteristic upon identification in steo (in); 

(v) repeating steps (ii) through (iv) until characteristics sufficient for classification of said peak are 
stored and 

(vi) using pattern recognition means (95) to process said stored characteristics of said attribute 
thereby classry^g said peak. 

78. Tne method of Claim 77 wnerein said steo of calculating sequentially an artriDute further comprises: 
using a seres of filters each of said filters calculating said attribute, wherein, 

a first filter has a characteristic time width corresponding to a characteristic time width of said series of aata 
bunches, and 

each of said other filters has a characteristic time width that is a multiple of said characteristic time width of 
said time series 

79. The method of Claim 78. the step of comparing said calculated attribute further comprising: 
comparing a measure of said calculatec attribute from each filter with (i) prior measures of said calculated 
attribute from that filter and (ii) prior measures of said calculatsd attribute from other filters to identify the 
measure of said calculated attribute best estimating one cf sa:d characteristics of said attnoute wherein said 
other filters have a specified relationship to that filter. 

80 The method of Claim 79. wherein said attnbute comprises a time derivative 



comparing saic storea extrema with extrema of the seconc time cenvative tor five groups o: peaks including 
(i) resoivec peaks, (n) slightly fused peaks of the same sign, ci) shgntiy fused peaks o' opposite sign, uv) 
strongly fusee peaks of the same sign and (v) strongly fusee peaks o ; opposite sigr thereby classifying 
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84. The method of Claim 78 wherein said series of filters comprise a series of digital filters 

85. A system C154, 153; for classifying a peak in measured signal data as one of the oeaks in a group 
consisting of resolved peaks and combinations of resolved peaks, wherein each of said resolved peaks and 
combinations of resotved peaks in said group has an attribute which in turn has characteristics and said 

5 characteristics of said attribute of each of said resolved peaKS and each of said combinations of resolved 
peaks are unique, comprising 

means (154), operatively coupled to said measured signal data, for generating a time series of data 
bunches corresponding to said measurea data; 

means (153), operatively coupled to said data bunch generating means, for sequentially calculating said 
jo attribute for each data bunch in said series of data bunches; 

means (153). operatively coupled to said calculating means, for comparing said calculated attribute to 
previously calculated attributes to identify one of said characteristics of said attrioutes wherein said 
comparison means stores each identified characteristic; and 

pattern recognition means (153), operatively coupled to said comparison means, for classifying said stored 
to characteristics of said attribute thereby classifying said peak. 

86. The system of Claim 85. said calculating means further comprising: 

a plurality of filter means, each filter means calculating said attribute wherein; 
each iiiiyr means has a characteristic ume width, and 

said characteristic time widths of adjacent filters differ by an integer multiple. 

?o 87. The system as in Claim 86, said comparison means further comprising: 

means for comparing a measure of said calculated attribute from each filter with (i) prior measures of said 
calculated attribute from that filter and prior measures of said calculated attribute from other filters to 
►dentify the measure of said calculated attribute best estimating one of said characteristics of said attribute, 
said other filters having a specified relationship to that filter. 

?5 88 The system of Claim 87, wherein said attribute comprises a time derivative. 

89 The system of Claim 88. wherein said time derivative comprises a second time derivative. 

90 The system of Claim 89. wherein said identifiable characteristics of said attribute comprise extrema 
r.l said second time derivative. 

91. The system of Claim 90. said pattern recognition means further comprising: 

jo means for comparing said stored extrema with extrema of the second time derivative for five groups of 
peaks including <i> resolved peaks, fii) slightly fused peaks of the same sign, (iii) slightly fused peaks of 
opposite sign * (iv) strongly fused peaks of the same sign and (v) strongly fused Deaks of opposite sign 
thereby classifying said stored extrema as representing one of said five groups of peaks. 

92. The system of Claim 86 wherein said series of filters comprises a series of digital filters. 

35 93. A method for classifying a peaK in measured data as one of the peaks in a group consisting of 
resolved peaks and comcmations of resolved peaks, wherein each of said resolved peaKs and combinations 
of resolved peaks in said group has an attribute which in turn has charactenstics and said characteristics of 
said attribute of each of said resolved peaks and each of said combinations of resolved peaks are unique, 
comprising the steps of: 
40 (\) generating a time series of data bunches corresponding to said measured data; 

(n) calculating sequentially said attribute for each data bunch in said series of data bunches; 
(in) comparing said calculatec attribute to previously calculated attributes to identify one of said 
characteristics of said attributes: 

(iv) storing said characteristic upon identification in step (iii); 
45 (v) repeating steps (ii) through fiv) until characteristics sufficient for classification of said peak are 

stored: 

(vi) using pattern recognition means to process stored characteristics of said attribute thereby 
classifying said peak; 

(vii) using said peak classification and said identified characteristics of said attribute to select a set of 
so data bunches from said series of cata ounches fcr further characterization: and 



~5ina s series ot inters, haz" :.■ sa.c n:ers calculating saic anncuifc wiere;r.. 
55 a first filter has a characteristic time width corresponding to a characteristic time width of saia series of data 
bunches: and 

each of said other filters has a characteristic time width that is a multiple of said characteristic t me width of 
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comprises a parameter set for an exponentially modified gaussian peaK navmg a first inflection point a: a 
first point in time, a second mflecton point at a second point in time ano a maximum displacement 
occurring at a third point in time, said third point m time being between said first and seconc ocnts in time 

32. The system as in Claim 31, wherein said first cnaractenstic width composes a t-me differerce 
5 between said first point in time and said second point in time. 

33. The system as in Claim 32, wherein said second characteristic width comprises a time difference 
between said third point in time and said second point in time. 

34. The system as in Claim 30. wherein each of said peak definition parameter sets of said lookup table 
comprises a parameter set for a genera! shaped peak navmg a first inflection point at a first point in time, a 

to second inflection point at a second point in time, and a maximum disolacement occurring at a thirc point in 
time, said third point in time being between said first and second points in time wherein said general shape 
peak comprises a convolution of an exponentially modified gaussian peak and another peak. 

35. The system as in Claim 34, wherein said first characteristic width comprises a difference in time 
between said first and said second points in time. 

>5 36. The system as in Claim 34. wherein said second characteristic width comprises a time difference 

between said third point in time and said second point m time. 

37. A method for analysts of time varying signal data ng a plurality of peaks superimposed on a 
Dfiseime mg tne steps of: 

analyzing (93) said signal to identify a portion of said data corresponding to one of said peaks tn said 
piuro ity of peaks wneretn said portion of data includes signal data for both said peak and fo r said baseline 
ucon which said peak is superimposed; and 

cfvrcctcrizing (96) said peak and a baseline for said peak using said portion of said data corresponding to 
cad peak without a prior baseline correction. 

38 The method as in Claim 37, said analyzing step further comprising: 
?*- ovgitijmg said signal data to form a time series of digitized data representing said signal data. 

39. The method as in Claim 38, said analyzing step further comprising: 
gene-atmg a derivative of said digitized data wherein said derivative has extrema. 

40 The method as in Claim 39, said analyzing step further comprising: 
locating said extrema of said derivative and storing each of said located extrema. 
jc 41 The method as in Claim 40. said analyzing step further comprising: 

using a pattern recognition means (95) to classify said stored extrema as representing one of a group 
consisting of (i) a resolved peak and (ii) a combination of resolved peaks. 

42. The method as in Claim 41. said pattern recognition steD further comprising: 

comparing said stored extrema with (t) extrema for resolved peaks having known characteristics and (ii) 
js extrema for combinations of saic resolved peaks having Known cnaractenstics Thereby identifying said 
stored extrema as representing one of said group. 

43. The method as in Claim 41. saia analyzing step further comprising: 

selecting a range of said digitized data using said peak identification wherein said range of said digitized 
data mciuaes at least one of said measurec peaks and comprises said portion of said data 
jo 44. The method as in Claim 37, said characterizing step further comorising: 

determining a peak line shaoe and a baseline best fitting the data identified for said peak by said analysis 
step. 

45. The method as in Claim 44, said determining step further comprising' 
iterativeiy generating estimated peak line shapes for said oeak. 
45 46. The method as in Claim 45, said determining step further comprising: 

generating an error estimate for tne fit of each of said iterativeiy generated peak line shapes to said data 
identified for said peak. 

47. The method as in Claim 46, said determining step further comprising, 
selecting the peak line shape with the smallest error estimate thereby determining said peak line shape 
so best fitting said portion of data -dentified for said peak bv said analysis stec 



55 fitting a ime to data m said difference file. 

50 The metnoc as in Claim 49, saic error estimate generating step 'urtner comprising, 
forming an error estimate of said line fit to data in said difference file 
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squares error estimate. 

52. The method as in Claim 37. said characterizing step further comprising 
using neural net means (200) for generating a characterizing set of parameters (5) for ss'.a portion o' said 
data identified fo r said peak by said analysis step 
5 53. The method as in Claim 52. said characterizing step further comprising 

generating a derivative of said portion of sad data wherein said derivative comprises an input signal to said 
neural net means. 

54. The metnod as in Claim 53. said using neural net means step comprising: 
using first neural net means for generating a characterizing set of parameters for said portion of saic data 
w identified for said peak by said analysis steps; and 

using second neural net means for generating an error estimate for said characterizing set of parameters 
from said first neural net means wneretn 

said peak is characterized by combining said characterizing set of parameters and said error estimates for 

said characterizing set of parameters. 
is 55. The method as in Claim 37. said characterizing step further comprising: 

generating a first set of peak parameters, based upon said portion of said data, wherein said first set of 

peak parameters contain characterizing information about said peak. 

The mothnri ac |p Claim 55 said characterizing step further comprising" 

generating a characterizing set of parameters for said peak based upon one parameter of said first set of 
20 peak parameters wherein said characterizing set of parameters are a second set of parameters. 

57. The method as in Claim 56. said generating step further comprising: 

using a lookup table having a range of values for said one parameter of said first set of parameters and a 
set of peak definition parameters for each of said values of said one parameter wherein said characterizing 
set of parameters for said peak are generated using said first set of parameters in combination with a set of 
25 interpolated peak definition parameters obtained by interpolation within said lookup table based on the value 
of said one parameter. 

58. The method as in Claims 52 or 57, said characterizing step further comprising; 

determining a peak line shape and a baseline best fitting the data identified for said peak by said analysis 

wherein said set of characterizing oarameters is used in an initial estimate of said peak line shape. 
30 59. The method as in Claim 58. said determining step further comprising: 

iteratively generating estimated peak hne shapes for said peak. 

60. The methoc as in Claim 59. said determining step further comprising: 

generating an error estimate for the fit of each of said iteratively generated peak line shapes to said data 

identified for said peak. 
35 61. The method as in Claim 60. said determining step further comprising: 

selecting the peak line shaoe with the smallest error estimate thereby determining said line shape best 

fitting the portion of said data identified for said peak by said analysis step. 

62. The metnoa as in Claim 60. said error estimate generating step further comprising: 

subtracting said estimated peak line shape from said portion of data identified for said peak to form a 
•jo difference file. 

63. The method as in Claim 62. said error estimate generating step further comprising: 
fitting a line to data in said difference file. 

64. The methoc as in Claim 63. said error estimate generating step further comprising: 
forming an error estimate of said line fit to data in said difference file. 

•35 65. The method as in Claim 64, wherein said forming an error estimate step comprises forming a least 

squares error estimate. 

66. The method as in Claim 57, wherein said first parameter comprises a ratic of a first characteristic 
width of said peaK to a second characteristic width of said peak. 

67. The method as in Claim 66, wherein each of said peak definition parameter sets of said lookup table 
so comprises a parameter set for an exponentially modified gaussian peak having a first inflection point at a 



69. The Tie:hGa as in Claim 57. wherein said second characteristic w.dth comonses a time difference 
between said third point in time and said second point in time. 

70. The method as in Claim 65. wherein each o* said peak definition parameter sets o f said lookup table 

'""■.raises r. ra*amett F =e- *c a ce^^rai snacec rr ; aK nav rc :. ; 1 '-M ^ co ir * v,-: r :.rm 
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further analyzed to refine the peaks lr tr.is embodiment, 'or each peak, the storea aats oescriDirg tne 
peaks ie usea in the EMG function (Eq. 4) to generate cata that is subtracted rom the raw data so as to 
remove contributions from adjacent peaks and hence generate data for a smgie oeaK Then, using *ne 
stored set of parameters as initial estimates, a final four vertice Simplex fit :s dene on this data *o provide a 
best estimate set of parameters for the peak and the basetme. 

Whiie tne embodiment described herein provides accurate analyses m a reasonable computational time, 
other embodiments could use all neural nets, all lookup tables, all Simplex analyses, or other curve fitting 
analyses. The precise combination of methods used in calculate parameters woulc be determined by tne 
accuracy and computational time limitations. The emboaiments describee here;n are illustrative only anc 
are not intended to limit the scope of this invention. 



Claims 

75 1. A system for analysis of time varying signal data including a plurality of peaks supenmposec on a 

baseline comprising: 

analysis means (154, 153), operatively coupled to said time varying signal cata. for identification of a portion 

... .■ t . t. : _j ,-vi. rtf nn^l/c u/Horoin cain nnrtmn nt rl^TP 

OT SaiO aaia COrrespunuiny iu uite ui iaiu pcarso «i. joiw p.^.tiMty w. K w«- — ....w.w... , -■- - - 

includes signal data for both said peaK and for said baseline upon which said peak is superimposed; and 
20 means (153), operattvely coupled to said analysis means, for characterizing a peak and a baseline for said 
peak wherem: 

said portion of said data identified for each peak by said analysis means is characterized by said 
characterizing means without a prior baseline correction. 

2. The system as in Claim 1, said analysis means further comprising: 

55 means (151). operatively coupled to said signal data, for digitizing said signal data to form a time series of 
digitizec data representing said signai data. 

3. The system as in Claim 2. said analysis means further comprising: 

means, operatively coupled to said digitizing means, for generating a derivative of said digitizec data 
wherein said derivative has extrema. 
so 4. The system as in Claim 3, said analysis means further comprising: 

means, operatively coupled to said derivative means, for locating extrema of said derivative wherein said 
locating means stores each of said located extrema. 

5. The system as in Claim 4, said analysis means further comprising: 

pattern recognition means, coupled to said extrema locating means, for peak identification wherein said 
35 pattern recognition means classifies said stored extrema as representing one of a group consisting of (i) a 
resolved peak and (ii) a combination of resolved peaks. 

6. The system as in Claim 5, said pattern recognition means further comprising: 

means for comparing said stored extrema with (i) extrema for resolved peaks having known characteristics 
and lit) extrema for combinations of said resolved peaks having known characteristics thereby identifying 
40 said sto r ed extrema as reoresenting one of said group. 

7. The system as in Claim 5. saic analysis means furtner comprising: 

means, operatively coupled to said pattern recognition means, fcr selecting a range of said digitized data 

using said peak identification wherein said range o< said digitized data includes at least one of said 

measured peaks and said range of data comprises said portion of data. 
45 8. The system as in Claim 1, saic characterizing means furtner comprising: 

curve fitting means for determining a peak line shape and a baseiine best fitting tne data identified for said 

peak by said analysis means. 

9. The system as in Claim 8. saic curve fitting means further comprising: 

means for iteratively generating estimatea peak line shapes for said peak. 
so 10 The svstem as in Claim 9. sa-d curve fitting means further comprising: 



means, operativeiy coupled tc said error estimating means, tcr selecting tne peaK ime snape witr. tne 
smallest error estimate thereby determining saic peak line shape Dest fitting sa d portion of data identtfiec 
fo' said Deak dv said analysis means 
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means tor subtracting said estimated peak tine shape from said portion of said data identified to' said peak 
to form a difference file. 

13. The system as m Claim 12. said error estimate generating means further comprising: 
means, operatively coupled to said subtraction means, for fitting a line to data m said difference file, 
s 14. The system as in Claim 13. said error estimate generating means further comprising: 

means, operatively coupled to said line fitting means ana to said subtraction means, for forming an error 
estimate of said line fit to data in said difference file. 

15. The system as in Claim 14 wherein said error estimate comprises a least squares error estimate. 

16. The system as in Claim 1, said characterizing means further comprising: 

rc neural net means (200) fcr generating a characterizing set of parameters (5) for said oortion of said data 

ident ficd for said peak by said analysis means. 

17 The system as in Claim 16. said characterizing means further comprising- 

derivative means, operatively coupiec to said neural net means (200). for generating a derivative of said 

poftron of said data wherein said derivative comprises an input signal (X) to said neural net means. 
» r , * 18 The system as in Claim 16. said neural net means (200) further comprising: 

first neural net means for generating a characterizing set of parameters for said portion of said data 

iCK-nMied (or said peak by said analysis means: and 

second neural net means. operatively coupled to said first neural net means, for generating an error 

ctt fT dto lor said characterizing set of parameters from said first neural net means wherein 
:c si. a c*ak is characterized by a combination of said characterizing set of parameters and said error 

c?! ^t^ for said characterizing set of parameters. 

19 Thp s/stem as in Claim 1. said characterizing means further comprising: 

nv ,m<; lor generating a first set of peak parameters, based upon said portion of said data, wnerein said first 

set of peak parameters contain characterizing information about said peak. 
25 20 The system as in Claim 19. said characterizing means further comprising: 

means operatively coupled to said first parameter generating means, for generating a characterizing set of 

parameters for said peak based upon one parameter of said first set of peak parameters wherein said 

characienzing set of parameters are a second set of parameters. 

21 The system as in Claim 20. said characterizing parameter generation means furtner comprising: 
30 lookLp tab e means including a range of values for said one parameter of said first set of peak parameters 

ard a set of peak definition parameters for each of said values of said one parameter wherein said 

characterizing set of parameters for said peaK are generated using said first set of parameters in 

combination with a set of interpolated peak definition parameters obtained by interpolation within said 

lookup table based on the value of said one parameter. 
35 22 The system as m Claims 16 or 21. said characterizing means further comprising: 

curve lining means, responsive to said set of characterizing parameters, for determining a peak line shape 

and baseline best fitting the data identified for said peak by said analysis means. 
23. The system as m Claim 22. said curve fitting means further comprising: 

means tor iteratively generating estimated peak line shapes for said peak 
•so 24. The system as in Claim 23. said curve fitting means further comprising: 

means, operatively coupled to said estimated line slope generating means, for generating an error estimate 

fo r the fit of each of said iteratively generated peak tine shapes to said data identified fo r said peak. 
25 The system as in Claim 24. said curve fitting means further comprising: 

means, operatively coupled to said error estimating means, for selecting the peak line shape with the 
J5 smallest error estimate thereby determining said line shape best fitting the portion of said data identified tor 

said peak by said analysis means. 

26. The system as in Claim 24. said error estimate generating means further comprising: 

means for subtracting said estimated peak line shape from said portion of saia data identmed fcr said peak 

to form a difference file. 

so 27. The system as in Claim 26. said error estimate generating means further comDnsing: 



29. The system as in Claim 28. wherein saic error estimate comonses a least sauares error estimate. 

30. The system as m Claim 21 wnerein saia one parameter comprises a ratio (CRATtO) of a first 
:haractenstic width of said peak to a secona characteristic width of sad peak. 
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derivatives was precisely determined. Recall tha: tne time of tne first aenvative zco crcssmg is the 
retention time R T . After catenation of the retention time a table of r- a vs. (R-: g ) j was preoared. The 
coefficients Cc to Cs were aetermmed by doing a poiynomiaf fit m r-a to th:s tabic 

!t a anticipated that curve fits using the transformed variables will be significantly faste- tran curve fits 
5 using fie EMG parameters. Further, it is anticipated that the accuracy of the transforneo vanaole curve 
tatng process will be at least as good as the accuracy of the Simplex process used to es:imate the EMG 
characterizing parameters. 

The lookup table, the neural net cr tne Simplex fit may be used in any number of combinations to 

generate estimates of the set of parameters for each peak detected. For example, a neural net could be 
w imtictiy used to estimate the set of characterizing parameters for the raw oata from seiect cata 104 and 

tries- characterizing parameters could be the initial estimate providec to the Simplex fit. Alternatively a 

ici>jp table could oe used to provide the initial estimates for tne Simplex fit. 

in vet another embodiment, a combination of neural networks could be used to refine the estimates for 

t»»- act of characterizing parameters. To demonstrate one application of a combination of the neural 
n »iu*>fci. lockup tables and Simplex fittings, data, as shown in Fig. 19. is used. This data has an initial 

fes-^uvJ jr-eak 301 (Fig. 19A) and then a set of four fused peaks 302. 303. 304, 305 herein the first pair 302. 

X3 <r^ 19D) are highly fused peaks of the same sign, the second pair 303, 304 (Fig 19C) are slightly 

, u . ~t . „ rt ™w the thi^H r-so.r- ina ^nc; /Pin -\QH\ aro alcn <;ltnhtiv fnser! npaks nf the 

■ „.>».-_* t.var; ui H iw iJUiti^ Jiyn, uiiw »"v >•> fjw. . ww ., www v -g. • — — / — • - ^ ' 

22 F ; ' resolved peak 301. locate extrema 101 (Fig. 7) detects each of the four extrema of the curvature 

or* t-u::cc data describing the extrema in the extrema file, as previously described. Classify data 102. upon 
c\t< rm n.rg mat the extrema file includes four extrema with a baseline as one of the extrema. indicates that 
; Tec .-cc peak has been found. Measure extrema 103 precisely locates the peak crest and the inflection 
pooti »or me resolved peak, as previously described. Select raw data 104 provides twenty aata bunches for 

25 anai.'s -s by calculate parameters 105. 

Revived peak 105 53 (Fig 20) m calculate parameters 105 first tests the integer output from classify 
e*trcm„ 102 to ascertain whether a resolved peak has been aetected. Since peak 301 is a resolved peak. 
res>wi pra< 105sc passes control to true resolved peak 105s*. True resolved peak 105~- determines that 
th<* n*--** s a true resolved peak. i.e.. the peak is not the last peak in a sequence of fused oeaks. Since 

30 t*>*k 30i a true resolved peak, processing is passed to a parameter estimate neural net 105s : . 

•r -ni5 embodiment, parameter estimate neural net 105 5 2 has been trained to generate a set of EMG 
chara.v*ri;mg parameters for a resolved peak when sixteen evenly spaced curvature values are applied to 
the no? as previously described. Thus, neural net 105s2 generates a set of four EMG peak parameters for 
reso:ved peak 301 using the data from select data 104 as described above. After neural net 105*2 

35 completes processing, an error correction neural net 1C5s3 which has sixteen input units, a hidden layer 
with tnirty-three units, and an output layer of four units as in parameter estimate neura: net 105e: is used. 

Err* co-rection neura; net 105s: has been trained to generate error estimates for the se: of 
charactering parameters generated by parameter estimate neural net 105s 2- The weights in error 
correct on neural net 10653 were defined using a process similar to that describee above However, for 

40 neural ne* 105=3, a first set of EMG parameters were randomly selected and a second set of EMG 
parameters were also randomly generated. The traces generated by using Equation 4 and the two sets of 
EMG parameters were subtracted to produce an error file. The curvature of the error file was supplied as 
input cata to neural net 105 5 3 and in turn neural net 105s 3 generated corrections for the first set of 
randomly generated EMG parameters. The weights in neural net 105 5 3 were acjusted by back propagation 

45 of error, as previously described, using the known error file generated by subtraction of the two traces. 

In this embodiment, the output signals V c . V 2l V 3 from neural net 105^2 are usee to define EMG 
parameters h, t g , a, and no as follows: 
h = V c * G"(t m ) 
a = 2 «■ V- 

50 T - = V: 



are generated from the error array using Ecuaticn S Tne sixteen curvature vaiues are applied to erro r 
correction neural net 105sj which in turn generates fou- correction values Y c . V. Y 2 Y 3 . The EMG 
estimated parameters are: 
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t g = ft m - 8t,) + t, " (8 + V 3 - Y 3 ) 
= t m * t, -fV 3 - yd 

c = t m - (V- - Y- + 2/ 
T - = V- - Y : (35) 

5 After the determination of the above set of parameters for resolved peak 301 , the parameters aescnbing 

resolved peak 301 are stored by storD peak data 105ss for further analyses, as oescribed belo i /.' Advance 
peak 105,c shifts the data for resoivea peak 301 out of the extrema files and initializes the extrema counter 
so that it is ready to begin analysis for a second peak. 

Locate extrema 101 (Fig. 7) continues to analyze the data bunches and when the eighth extremum is 

70 detected and stored in the extrema f ie. classify extrema 102 analyzes the extrema file. Classify extrema 
102 ascertains that extrema in the extrema file represent highly fused peaks of the same sign. Based upon 
the integer from classify extrema 102. select data 104 prepares 66 data bunches for processing by 
calculate parameters 105. Calculate parameters 105 ascertains from the integer provided by classify 
extrema 102 that the peaks are fused and therefore resolved peak 105=c (Fig. 20) Dasses control to prior 

15 peak 105=;. Since first peak 302 (Fig. 19B) of pair 302. 303 is being analyzed, the prior peak flag is not set 
so that prior peak 105=5 (Fig. 20> transfers to estimate fused peak parameters 105=*. In estimate fused 
peak parameters 105=-: signature ratio CRATIO, as defined above, for first peak 302 is usee with Table 1 
and Equriiiunb 10-14 to estimate tne CMG parameters for the first peak. SuoscQuont'y, the scrr.c Signature 
ratio CRATIO along with the parameters from measure extrema 103 for peak 303 are used to estimate the 

20 set of EMG parameters for peak 303 using Table 1 and Equations 10-14. Hence, estimate parameters tOSse 
generates two sets of EMG parameters 

Simplex fit 105=7 performs a Simoiex fit for the eight vertices and generates a best fit estimate of EMG 
parameters and a baseline estimate tor peak 302. In another embodiment, a selected EMG parameter or 
selected EMG parameters for each peak in a pair of fused peaks could be taken as the same value. When 

25 the same parameter value is used for corn of the peaks in the fused peak pair, the number of vertices in the 
Simplex fit is reduced by one for each of such parameter values. 

The set of parameters from Simplex fit 105=? is added to the stored peak data file by store peak data 
105=?. Advance peak 105=4 eliminates tne four extrema for first peak 302 from the extrema file and resets 
the extrema counter appropriately. In addition, advance peak 105s* sets the prior peak flag. 

30 Locate extrema 101 (Fig. 7> continues to add additional extremum to the extrema file until eight extrema 

are in the file for peaks 303, 304 (Fig. 19C). Classify extrema 102 analyzes the extrema file and identifies 
the extrema as representing a pair of lightly fused peaks of the same sign. Select data 104 (Fig. 7) again 
identifies 66 data bunches for processing by calculate parameters 105 because classify extrema 102 
classified the extrema as representing fused peaks. Resolved peak 105$c (Fig. 20) in calculate parameters 

35 105 passes control to prior peak I05=r The prior peak flag was set by advance peak 105=<. Pnor peak 
105== passes control to remove prior peak 105ss. The parameters for peak 302 were storea by store peak 
data 105: = . Tne parameters for peak 302 are used with Equation 4 in remove prior Deak I05;b to estimate 
the contribution of peak 302 to the data bunches provided by select data 104 for peaks 303. 304 (Fig. 19C). 
Remove prior peak 105== subtracts the contribution of peak 302 from the raw data so that only data 

-jo representing second and third peaks 3C3. 304 remains. Estimate parameters 105=.& and Simplex tit 105=7 
generate a best fit set of parameters for second peak 303 in a manner similar to that described above. After 
store peak data 105; 2 saves the data describing peak 303. advance peak 105s 4 removes the extrema tor 
the second peak 303 from the extrema files, initializes the extrema counter appropriate ly, and again resets 
the prior peak flag. 

J5 Locate extrema 101 (Fig. 7) then continues to find extrema until the baseline is detected. Determine 

case 102 analyzes the extrema files and identifies the third pair of peaks 304, 305 (Fig 19D) as slightly 
fused peaks of the same sign. The process described above for the second pair of peaks 303. 304 is 
repeated for peaKS 304, 305. 

After peak 304 is characterized and control is passed to locate extrema 101. locate extrema 1 0 1 passes 

50 control to classify extrema 102 whicr identifies the extrema file data as a resolved peak, in calculate 



55 removed using the known parameters *or peak 304 in a manner similar to tha: previously described for 
remove prior peak 105bs. Simplex fit 105=,- performs a tour vertex Simplex analysis and generates a best fit 
set of EMG parameters for peak 305. 
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peaks detected as described above. Simplex fitting procedures are well known rn the prior an and so cnr, 
the genera: procedure utilized is described herein. For a more detailed dfscussion of Simplex analyses see 
for example, William H. Press et a!.. Numerical Recipes in C_ the Ai of Scientific Computing CamDridg? 
University Press. Camoridge. England (1988) ana M.A. STiaraf et ai., Chenomeincs . Jonn Wiley & Sens 
5 New York, 1986. 

The number of parameters, vertices, used in the Simplex analysis depends upon whether a smgie Deah 
is analyzed or whether a pair o : fused peaks is analyzed For a single peak, a four venex Simple* 
procedure is used while for a pair of fused peaks up tc an eight vertex Simplex procecure may De usea. in 
one embodiment, a seven vertex Simplex fit was used. The same EMG parameter 7 was used for bar, 

:c fused peaks in a fused peak sequence. 

To describe the general Simplex procedure implemented in calculate parameters 105 a simple example 
is used. This example is not intended to limit the scope of the invention to the embodiment shown, out 
rather is intended to illustrate the general process. Since the general process utilizes a multidimensional 
space, the general process is not amenable to a two- or three-dimensional presentation 

:s A process flow diagram for the Simplex process is illustrated in Fig. 17. Initially frame peak 105- 

defines the width of the peak in terms of the number of data points used in the Simplex process After 
frame peak 105-, initialize vertices 105- defines the initial estimates for the set of peak parameters used m 

<"> 1... A ^4 +w« c-^t r*f nif-sm^flrr -a firct owaliiatp fit 1 PCtim^tpc; fhP 

Lilt? OMIIfJIOA yj I nuci tuc II inidiii-cinvji j wi liiij owt y-> > jj u t u . . ■ u i w . w. ~ . ... . , . _ 

peak shape with the EMG function defined in Equation 4 using the estimated parameters. The estimated 
20 peak shape is subtracted from the raw data provided by select data 104 to produce a difference value at 
each data point. A straight line is fit to the differences between the raw data and the estimated Deak shape. 
The straight line is found using Eq. 7. A least squares analysis of the fit of the line to the difference data is 
used to generate an error estimate for each of the vertices. Fig. 18 illustrates three vertices 501. 502. and 
503 which are estimates for the parameters of interest. In Fig. 18. vertex 503 has the highest error in 
25 evaluate fit 105 3 (Fig. 17). 

The next step in the analysis is to proceed iteratively tc find the best fit to the data. Accordingly max 
iteration or shrinK 105^ initializes the number of iterations at zero and the shrink at zero. Max iteration or 
shnnK 105* in one embodiment, continues the Simplex process until either the shrink reaches 12 or the 
number of iterations reaches 1,000. After initialization of the iteration and shrink parameters, reflect error 
30 1055 first locates the vertex with the maximum error, e.g., point 503 in Fig. 18, and reflects high error vertex 
503 through plane 510 defined by vertices 501, 502. Vertex 504 represents the reflected image of point 
503. 

A first evaluate fit 105& (Fig. 17) uses new vertex, point 504, and recalculates the least square error for 

the straight line fit as described above. The new error estimate is checked against the original error 
35 estimate by lower error 1057 and if the new error estimate for vertex 504 is lower than the error estimate for 

original vertex 503, replace vertex 105s stores vertex 504 and the associated erro r value. 

Advance direction 105 9 then takes anotner step in the same direction as from point 503 to 504 

Specifically, vertex 504 is projected a distance equal to 2L from plane 510 to vertex 505. as snown m Fig 

16. After advance direction 105? (Fig. 17), a third evaluate fit 105-c is performed to generate the error for 
40 vertex 505. If the error associated with vertex 505 is less than the error associated with vertex 504. a 

second lower error test 105- passes control to second replace vertex 105-:. After replace vertex 105*;. 

decrement shrink 105*3 decrements the shrink by one which indicates a piex expansion. Decrement shrink 

105-3 passes control to max iteration or shrink 105*. 

If the error associated with vertex 505 is greater than the error associated with vertex 504, lower e r ror 
45 105i- passes control to max iteration or shrink 105i. Max iteration or shrink 105* determines whetner either 

the maximum number of iterations or the shrink has been met and if not begins a reoeat of the process with 

reflect high error 105s- 

If after evaluate fit 105 5 , the error associated with vertex 504 is greater than for the error for vertex 503 
lower error 105? transfers to a second advance direction 105-* which moves the high error vertex only naif 
50 the distance to plane 510 formed through points 501. 502. The new vertex is vertex 506. The shrink count is 



retrieves the Dest estimates of :ne four EMG parameters wmcn characterize tie chromatographic peak aria 
performs a baseline adjustment to the raw data tor the best straight i-ne fit that has been founc 
corresponding to tne four EMG parameters. Therefore, unlike prior art methods which used a pner 
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according to the principles of this invention is noi made urtil the final step in the process which aef nes the 
se: of parameters for each peak. 

Ir tne Simplex curve fitting, a family of curves is generated in determining the best fit to the data. For 
each curve fit. a value of the EMG characterizing set of parameters a. t. tg. h are estimated If the skewness 
is changed, for example, to obtain a better curve fit. each of the other EMG parameters also cnanges. 
Moreover, the change in the EMG parameters is a complex function, i.e., the characterizing EMG 
parameters are not either linearly related or related in a way that once tne change in one c the EMG 
parameters is known, the change in the other EMG parameters is easily ascertained. This interaction of the 
EMG peak characterizing parameters makes the curve fitting complex and requires the generation of the 
family of curves described above. 

However, as previously described, the first set of parameters generated by the process ana apparatus 
of this invention, i.e., the amplitude of the peak crest, the time of the peak crest, a measure of the peak 
width, and a measure of the peak skewness contain characterizing information about the measured peak. 
Unfortunately, this first set of parameters is not the EMG characterizing set of parameters. Nevertheless, a 
reversible transformation can map the EMG characterizing set of parameters o, t. t g . h into a transformed 
set of parameters which include the observed retention time Rt (i.e.. the time of the peak crestj, the square 

cond mome 
srnplitufls is the ampii 

the r o ratio Here, a reversible transformation means that either the EMG parameters can be mapped into 
20 the transformed parameters or the transformed parameters can be mapped into the EMG parameters. For a 
dpscnpton of the second moment of the measured peak, see, for example. J. Foley and J. Dorsey, 
"Equations for Calculation of Chromatographic Figures of Merit for Ideal and Skewed Peaks." Anal. Chem. 
55 730-737 M9S3). 

One advantage of such a transformation is that the observed retention time and tne observed amplitude 
of the peak crest are known, i.e.. are in the first set of parameters generated, and do not change. Moreover, 
the square root of the second moment of the measured peak dees not change with changes m the t o ratio. 
Thus, the transformation reduces the characterization process from determining the four unknown EMG 
parameters to essentially finding the r a ratio that gives the best fit to the measured data. Further, the 
transformed variables are linear. The linear relationships further simplify the curve fit process. 
30 In one embodiment the transformation is: 

o = 
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t g = R r - (sgn(r o»*o' 

(Cz + (C- + (Cz -*■ fC: + (C* + fC : + (C- + (C? * C E > ct)"t n)> c)'i a)'r a)V o)"t o)'t a) 

where 

sgn<T a) ~ -1 when t a < 0 
- ■+ 1 when t a > 0 

C z = -0.00595515 

C- = 1.09834 

Cr = -0.528622 

r - r - pi 070 



... '. J b i Z c 
^ = -j. 00207816 
C £ = 0 0001255 

To obtain the coefficients C- to C 2 for the polynomial fit. tne first derivative with respect to time of an 

EMG f'jnctio n 'see Equat'rn 4^ with s ^*e? h - rhh a^d va^/mc value? of t c were synthesized 
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outpj* signal is too close to either zero or one. The initial weights were randomly chosen to oe within the 
range of -0.1 to 0.1 . 

In the previous description, output signal was scaled to generate signal V,. In the training process, the 
unscaiea output unit signai are computed. The target output values EMG,, i.e.. the randomly selectea 
b EMG parameters, are scaled to the unsealed output unit s;gnal domain by the transformation: 



S, = ( S77> * 0-5 (20) 



25 



35 



40 



45 



j = 1,2,3,4 



where 



15 EMG, = v EMG^ X = 8.0 



'1 G" ( t ) 

m 



EMG 2 = t/o EMG^ AX =4-0 

EMG 4 = tg-tjn EMG^* = 4.0 

- ^Accordingly, using the above expression, each taraet value EMG, is converted to an output ^-iit signal 
SF G which corresponds to tne unsealed output signal from that output unit. 

For each output unit, an adjustment fac.or is defined as 

A, = [S^ MG - Sr-]*^*[1-^] (21) 

j = 1.2.3,4 

Adjustment factor A, represents the difference of the unit outout target value £f MG minus neural net 200 unit 
output value times the first derivative of the logistic function. 

Adjustment factor Aj is used to adjust the weights s w, for the signals Sf' ~~' input to output unit j. The 
adjustment is accomplished by computing a correction term CP to be added to the weight N v\T 
corresponaing to the signal S~\ Correction term CTis: 
Cr = LEARN" A~S>' + MOMENTUM'CT — (22) 
and the new weight for the next iteration m + 1 is 
^ w"" = N w~ + CT (23) 
where 

m = iteration in training net for the set cf EMG parameters 

i = 1. 2. 3 33 

N = 3 and 
j = 1. 2. 3. 4 

MOMENTUM is a constant which in one embodiment was 0.9. MOMENTUM tends to keen changing the 
weight in the same direction as in the prior iteration of the training. LEARN is also a constant. 0.25 in one 
embodiment, which determines the extent to which the error in the current iteration influences the new 
weight. LEARN is roughly eauivalent to the step size in a steepest descent optimization algorithm. 

After the weights in the output units have oeen adjusted the weights m the hidden layer units are 
corrected. This correction is done ir several steps. First, it is necessary to calculate another adjustment 



55 

where 
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k is the index indicating the unit in row n 
m * 1 ts the index indicating the iteration 
I is the index indicating the unit in row n-1 
A h ts the correction term for k unit in row n 
5 5, is the correction term for unit j in row n-1 

~ w~ "is the weight in unit k in row n for the signal from the unit j in row n-1 on the m + 1 interaction of 
training with a specified set of EMG input signals 

Adjustment factor o } ts used in Equation 25 to calculate the correction terms CT, for the wetgnts in each of 

the units in a hidden layer and then the new weight is calculated using Equation 26 
rc CT - LEARN*6j'S~ ~' + MOMENTUM'CT "* (25) 

' w - = " w ~ + CT (26) 

Approximately one million EMG cases with randomly chosen values of o. t, and t g were used to tram 

ncura net 200. After this training, neural net 200 provided EMG parameter estimates that were good to 

about within 1 to 2°o when presented with EMG curves having parameters within the range of parameters 
75 used m the training The particular structure of the neural net m this embodiment is not essential to the 

invention. Greater or fewer units can be used in the input and hidden layers and more hidden layers can be 

included 

An important asoect of the neurai net is to presuale the amplitude of the input data so that the input 
units of the neural net receive values ranging from 0.1 to 0.9. Also, the width of the input data is scaled so 

20 that the inflection points of the EMG curve fall within the time frame of the neural net input units because 
this scaling assures that the net processes data with the best signal-to-noise ratio. 

With the noural net 200 trained as described above, net 200 in calculate oarameters 105 can be used to 
estimate a set of parameters, which characterize an EMG resolved peak. Neural net 200 is effectively an 
mterconnorted array or level sensitive switches, i.e.. each unit in the net is a level sensitive switch that 

25 generates a selected level output signal which is a weighted function of the input signals to the switch. The 
selected output level is determined by the training of the net. As described above, select data unit 104 
(Figs. 7. 14» has defined twenty data ounches in the time range t m - 8t. to U + nt, where U is the time of 
the maximum minimum of the peak and t, is the width of each data bunch. The second time derivative for 
each data bunch, except the first two and the last two data bunches, is evaluated using Equation 8. Hence, 

30 sixteen second time derivatives are calculated at evenly spaced time intervals. The ith input signal X, to 
neural net 200 is: 



35 



O.U«G"(t) 
m 



+ 0.5 



i : 0, t : t -6t. 

' m l 

i = 1, t = t -5t. 



(27) 



i = 15, t = t +9t. 



This scaling of the second derivative assures that all input signals to the neural net are between 0.1 and 0.9. 

After Thr. scamg. the input values are applied to the neural net and the neural net generates the four 
output signals V; V:. V 3 . 

Tne peak amplitude h is 
h = v 0 'G"(t m > (28> 

Tne time t g is 
t g = ft m - 8t t ) + l • (8 + V 3 ) 
= t m + V 3 "t, (29) 

... hrr- 



ana tne skewness. r o-. is - = v.- oil 
55 Finally, the peak area A is 
A = o ' h \ 2^ (32) 

lr add:T F o r tc f hc icokur ?ah lre anc ~ ^.^a' n e tc a? describee* abovp calculate oaramete r s '05 aisc 
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The neura 1 network used n this invention is an 'nterconnected set of units. Neural nctworK 200 is 
organized m layers of units as shown m Fig. 16. Nejral net 20C has three layers Tne first layer consist ng 
of unts 2C1--201-7 js input layer 201. input signals X--X- e are apDiied to units 201-201 respectively 
The input signals X-X- 6 are the curvature for sixteen evenly spaceo data bunches whim are caicjla:ec 

5 usmc Equation 8 Sixteen data bunches plus four additional oata bunches are selected from the aata 
bunches identified by select data 104 for calculation of the sixteen curvature values. 

Input layer 231 is followed by one or more "hidden" layers, units 202--202:: (Fig. 16). and terminated 
by an output layer 203. units 203- -203*. Neural net 200 is illustrative of only one embocnment of this 
invention and is not intenaed to limit tne scope of the invention. According to the principles of this invention. 

w a neura; net with more hidden layers or a neural net with more or fewer units in each layer can be used. 

The output signals of each unit in a given layer n of the network are supplied to each unit in the next 
layer n + 1 and no others. However, in every ayer except the outout layer, there is one unit 201-7. 202 3 3 
with a f xud output signal of one and no input signals. Input signals X--X-& to net 200 are aoplied to input 
units 2Ci--20i-£ one value per unit. The output signal of each input unit is just the input signal. All units in 

/5 the neural ne: 200. except input units 201--201-; and fixed units 201-7. 202 33 , transform the set of input 
signals *rom the previous layer of the network into a single output signal using a sigmoida! or logistic 
function 

Specifically. Tie output signai 3 oi unit j in trie nth layer of the network ;s: 



n = 2, N 



1 - e 



(15) 



x = 



k 
I 

. - 1 



n c n-1 

j w i s i 



where 

S n is Ihtr uutput bignal of the jth unit in the nth tayer of the neural network 
k is the uutibe' of units in the n-1 layer of the neurai network 

" w, is the weight in the jth unit of the nth layer for output signal & from the ith unit in the n-1 layer 
N is the numoer of layers in the network. Thus, each input signal ~* from layer n-1 of the network is 
assigned a unique weight function " w,. The information content of neural net 200 is carried in the values of 
weignts " 

When input signals X--X- 6 (Fig. 16) are aoplied to neural net 200. signals propagate through the net a 
layer at a time until signals reach output layer 203. Tne amplitude range of output signals from output layer 
203 is between zero and one. but typically this is a more limited domain than is desirable Therefore, each 
output unit signa ST is scaled so that the resulting output signal V J fails within the range of -0.9" MV and 0.9' 
MV where MV is the maximum value of the pea* parameter. Constant MV was selected based upon 
empirical experience with neural nets and the analysis of chromatographic data. Scaling of data over a 
selected r^ngo is a method known to one skilled in tne art of data analysis 

Specifically 
V, = (ST*- 0 5''MV,*2 (16) 
where 

V, is the output signal for the jth unit in layer N of the neural net 
MV, is selected so that -0.9 MV < V, < 0.9 MV for parameter j 

Prior to using neural net 200 to process sixteen evenly spaced second time derivative values, net 200 
was first trained, i.e.. the weights in neural net 200 were defined 

Pnc.fi; »nc ^to- -cph ?;a c *r r ? nr.r.,m ]■_■ c ^io~+ - -r ?.nH v nar?,mete rr f nr a r FMG ^'jrvp Usnq the 



neurai netA-orK a - .- tnc neura networx prccessec tne ■-.aiues :^ o r ccuCt. a r . est;ma*t.- :' :nt tMb 

parameters usea to generate the input data 

The estimated EMG parameters were comparec with the actual EMG parameters anc the neural 
"c*wrk wp': h '" 3dr.]s^- H rn-nimi2e the e rr c r between *ne parameter aenerated Dv the network and the 
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back-oropogat;ng errors. See to- example. D. Rumelhart, G. Hinton. ana R. Williams, "Learning representa- 
tions by back-propcga:mg errors." Nature 323, Dp. 533-536 (1986). The DacK propagation was repeated 
approximately one million times until the output signals of neural net 200 matched the aesired outpjt 
signals to within tne desired accuracy for a specified large number of test cases 

5 Specifically, in one embodiment, the following procedure was used to tram neural net 200 to generate 

EMG parameters from sixteen evenly spaced time values of the second time derivative of an EMG function. 
Values for a, t, and t g were randomly selected. The values for each parameter were selected from a domain 
having tne range of that EMG variable. The algorithm usee to gene-ate the random values for the EMG 
parameters also prescaled o and t so that the time between inflection points on either side o ; the EMG 

ro peak was between 6 and 12. and prescaled t g to range within ±1 Specifically. 

h = 25,000 



0 RAND 
2 * 32,768 



(17) 



t /0 „ x , RAND v n , 
a (3 - 7)( 327768 ) + °- 1 



, , RAND , 

fc g = 10 * ( 32?768 ) 



^ where RAND is a random number between 0 and 32.768. This scaling generates a set of parameters within 

the range required to generate an EMG peak. 

After the rancom values of the EMG parameters have been generated, the EMG function. G(t) in 

Equation (4j was evaluated using the generated parameters and integer time values. For the above scaling. 

the time values were 0. 1. 2 29. Again G(t) (Equation 4) was evaluated using the series expansion 

described above. The digital filter for the curvature, G "(t), which is proportional to the second time derivative 

of the EMG function, was evaluated at time points 2, 3 27. 

After curvature G (t) was calculated for each time t. a value of t m was found so that G <t m ) is £ G It) for 

all t in the range ft g - 5*o) < t ^ (t g + S'a). G"(t m ) was the curvature corresponding to the 

maximum minimum of the EMG function. After the time U was defined, the value of the second derivative 
35 G*(t m ) was used to normalize the 16 values of G" consisting of G"(t m -6j. G (t m -5) G (t m *9). Soecifically 

c^(t) = °- li c » ( c t "\ t) » 1 -- V 6 ' V 5 ' V 9 (18) 



40 



The input signals to neural net 200 are 



X. = C"(t) + 0.5, 



i = 0 t = t -6 

i=1 t = t -5 
in 



(19) 



i = 15 t = t +9 
m 



Prior to providing tne first set of inputs signals to neura net 200. al: tne weights in the net must oe 
initialized to a different value, since if two units of the same layer have identical weights on corresponding 
input signals, the units will continue to have the same output signas and weights throughout the training 

T rr "~ccc The iri"?. . . y*;. ;v^r- 'e-ia.* 1 . s^a 1 ' + k v "-p'w^rk^ ~.r'l t; tc Joarr res*"7!£-^ ! * th-r- i.nc 
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TABLE 1 



t 


cratio, 


7 C 


sigma, 


heignt, 


rf me, 


*) 


\j ~ <j <j *j i \j 


0 100000 


0.491707 


-0.1491 50 


-0 043384 


2 


0.495770 


0.300000 


0. 478256 


-0.166981 


-0.218163 


3 


0 4881 31 


0.500000 


0.462633 


-0.190488 


-0.345843 


4 


0 480089 


0 700000 


0.448277 


-0.21 8996 


-0.441 499 


5 


0 470309 


0.900000 


0.436728 


-0.250592 


-0.509524 


6 




1 100000 

i , f \j w w w W 


0.428109 


-0.285412 


-0 5658S4 


7 


0 462249 


1 300000 

• \J U Is Is Is 


0 4201 33 


-0 320610 


-0 623333 


8 


0.456172 


1 .500000 


0.413130 


-0.355721 


-0.660210 


9 


0.448825 


1 .700000 


0.408687 


-0.393306 


-0.683244 


10 


0 443877 


1 .900000 


0.404198 


-0.431676 


-0.708442 


11 


0 435612 


2.100000 


0.400068 


-0472040 


-0.719531 


12 


0.432274 


2.300000 


0.396573 


-0.511801 


-0.739838 


13 


0.428108 


2. 500000 


0.393350 


-0.552216 


-0.75431 1 


14 


U. 426/40 


2.700000 


0.39 1009 


-0.596 i 44 


-U.//UO/U 


15 


0.421326 


2.900000 


0.389283 


-0.637783 


-0.776923 


16 


0.416583 


3.100000 


0.386530 


-0.681665 


-0.785714 


17 


0 416855 


3.300000 


0.385099 


-0.724638 


-0.800000 


18 


0.417341 


3.500000 


0.385416 


-0.768505 


-0.812329 


1S 


0 416615 


3.700000 


0 383183 


-0.812269 


-0.824657 



As defined previously, skewness is the ratio of peak half widths at a selected peak height In this 
embodiment, the peak half widths were measured at the half-neight of the Gaussian peak. Table 1 was 
generated using 19 different EMG curves. The range of parameters for the EMG curves were selected so 
tnat tne data in Tabie 1 bound the EMG curves usually encountered in HPLC. 

For a calculated signature ratio. CRATIO, the vaiue of t-c is determined by first finding the values 
cratic,_i. cratio, in Column 2 of Table 1 which bound calculated signature ratio CRATIO If calculated 
signature ratio, CRATIO is greater than or equal to 0.5, CRATIO is set to 0.4999 and if CRATIO is less than 
or equal to 0.41658. CRATIO is set to 0.41658. A fraction, fr. is defined as: 

CRATIO - cratio. 

fr = -r^- (10) 

cratio , - cratio. v ' 

i-I l 



where cratio, anc cratio,. , are the values in Table 1 that bracKet CRATIO. If fraction, fr, is greater tnan one. 
the fraction is set to one 

The estimated skewness parameter, ns. is: 
ns = (fr) x [t o,.,] + d-fofT.o,] (1 1) 

After estimated skewness, ns. for the trailing peak is determined, the time difference in zero crossings 
for the trailing peak is converted to an estimated peak width parameter, nw, using the values, sigrna, in the 
third column of Tabie 1 and fraction fr, calculated above: 
nw = [(frHsigma,.!) + (i-fr; w (sigma,)] " [zcrossr-zcross- ] (12) 
where 

nw = estimated Gaussian standard deviation 

."""OS?- — tmp of ffct 2 o ' n r^cssiro (V secono o*P n vat' 



parameters l C 3 acz j' j.n^ 

nh = [(f'"height P .i ) *»- i 1-frfheight,] * nw * nw * mmcurvl <T3) 

This expression for the estimated peak amplitude parameter converts the peak w dth parameters generated 

m the processing o* the c*ata to an EMG amplitude Darameter 
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the estimated pean width, nw, and time. maxt. of the center of tne trailing peak, 
nrt = [fr"rtime,.i ■+ fi-fn " rtirne,] " nw + maxt (14) 

Thus, using interpolated peak definition parameters obtained from lookuc Table 1 and the set cf 
parameters containing characterizing information for the peak, the EMG parameters for the trailing oeak 
5 have been completely defined. Using the estimated parameters, nw, ns. nh, nrt. the EMG trace is ca cu;ated 
using the series expansion (Eq. 9) described above in Eq. 4 and subtracted from the raw data. 

Adjust raw data 104-. in this embodiment, has processed the extracted raw data so that only the data 
for a single resolved peak remains. This data is analyzed by calculate parameters 105 tc determine the 
EMG parameters that cnaractenze the selected data. 
io If calculate parameters 105 processes fused peaks, then adjust raw data 104^ is similar to the 

embodiment aescribed above except the trailing peak is net removed from tne raw data. Hence, in this 
embodiment, adjust raw data 104* (Fig. 14) subtracts any contribution from a prior peak and then provides 
calculate parameters 105 with data corresponding to a pair of fused peaks. 

As previously described, calculate parameters 105 (Fig. 7) analyzes the raw data from select data 104 

75 and' generates a set o : ' parameters which cnaractenze each peak in the raw data. Alternatively, calculate 
parameters 105 can be used to characterize peaks provided by any process that identifies data fer a peak. 
Calculate parameters 105 uses cne of (i) a lookup table; (ii) a neural net. (iii) curve fitting: anc (iv) 
uumuir laiiurib u'i lookup tables, neural nets or curve fits tc generate a set of characterizing perimeters for 
each peak. The precise method used m calculate parameters 105 depends upon the accuracy and the 

20 computational time desired. 

Lookup tables and neural nets are both computationally fast, but the achievable accuracy depends on 
the accuracy of the input parameters and the detail built into either the lookup table or the neural network. 
Simplex fits are somewhat slower than lookup tables or neural nets, but such fits provide a high degree of 
accuracy As indicated above, calculate parameters 105 may also include a combination of these ap- 

25 proaches to achieve a balance between speed and accuracy. 

The process described above in which the signature ratio CRATIO was used to estimate the parameters 
for a trailing peak in select data 104 can be implementec directly in calculate parameters 105. The use of 
CRATiO and Table 1 is an example of using a lookup table in calculate parameters 105 In fact, the general 
approach of using a lookup table may be expanded to include generalized functions rather than EMG 

30 functions as in Table 1. 

The more generalized lookup table in calculate parameters 105 would be. in one embodiment, based 
upon a convolution integral of an EMG function and an instrument function. The instrument function is 
determined by passing an eluent directly through the detector directly from the injector and measuring the 
resulting detector response. Unlike the measurement to determine a threshold level, i.e. a blank chromatog- 

35 raphic run, the generation of the instrument function does not include the characteristics of the chromatog- 
raphic column, but ratner only information about the detection and injection systems and the associated 
electronics. The measured instrument function and an EMG function, as represented by Equaticn 4 above, 
would be convolved using a convolution integral to generate a genera! chromatographic function. 

As described with respect to Table 1 , a range of general chromatographic functions would be 

■so generated, i.e.. a number of EMG functions would be convolved witn the instrument function. Parameters 
that characterized each of the generalized chromatographic functions would be implemented in the lookup 
table. Using a variable GRATIO. equivalent to the CRATIO aefmed above for only EMG functions, the 
parameters used to characterize a measured peak would be selected based upon the value of variable 
GRATIO and interpolation within the lookup table. The use of the generalized lookup taDie would be totally 

45 equivalent to that discussed above with respect to CRATIO. Alternatively, the lookup table for the 
generalizea chromatographic function would provide an initial estimate of parameters for a Simplex analysis, 
similar to that described below. 

The use of a lookup table based upon the signature ratio provides a means for cnaractenzing a peak 
based upon data with the best signal-to-noise ratio, as described previously, because the data used to 

50 generate the signature ratio is about the peak c r est. In this embodiment, the characterizing oarameter set 

j.;L-in;c3 i:; unit i;j r :i beifcLi UdLd *■ 04 iiic. yunt'icnj C-~ » 6 . '" C . ar.^e 1" . , t i>\ GCSCnocc 

55 oelow, tne neura net has been trained to generate EMG parameters, c, t, t g anc the peak amplitude which 
best represent an EMG peak. As used herein the EMG parameters, o. t, tg, h are tne characterizing set of 
parameters The peak area and other peak characteristics cf interest are generated using this set of EMG 
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data selected extends generally from the i g -5'a to t g + 5*c where c is the standard deviation associatec v.-itr 
the pea* and t g is the time of the maximum cisclacemen:, i.e., tne peak crest, o ; tne gaussian peaK Using 



this criteria, staring point 104; multiplies the data bunch size from ouncn size 104- by a constant, 16 m 
one emoodiment. and subtracts the oroduct from the location of the peaK center :o define tne starting point, 

5 i.e.. the initial aata bunch for the peak analysis. 

With the initial data bunch specific, a total number o* aata bunches are extracted rrom tne raw data by 
extract raw data 104^ so that the total range of data is within about the ±5a range The specific 
implementation of extract raw data 104 3 is dependent upon calculate parameters 105 If calculate 
parameters 105 processes a single peaK at a time, extract raw data 104 3 mjltiones the constant usee m 

w starting point 104^ by two and adds four data bunches to the product. The resulting number represents the 
number of data bunches, each data bunch having the width cefined by bunch size 104- . selected by extract 
raw data 104:.. For a constant of 16 in starting point 1042. thirty-six data bunches are selected from the raw 
data by extract raw data 104 3 . 



If calculate parameters 105 processes a pair of fused peaks, a wider range of data is selected by 



15 extract raw data 1043. In this emDOdiment. the constant from starting point 104 : is multiplied Dy four and 
two is added to the product by extract data 104 3 . Hence, if the constant is again 16, extract -raw cats 104 3 
selects 66 data bunches each with the width defined in bunch size 104- starting with the buncn defined by 
Starting point 1042. 



The processing used in adjust raw data 104* to remove contributions from adjacent peaks is also 



20 directly dependent upon the method used in calculate parameters 105. If calculate parameters 105 
characterizes only resolved peaks, adjust raw data 104* (Fig. 14) includes the steps of (1) testing for a prior 
peak 104 4A (Fig. 15); (2) removing prior peak 104 4B ; (3) testing for a trailing peak 104 4C ; (4) estimating 
trailing peak EMG parameter's 104 4D ; and (5) removing trailing peak 104^. 



After eacn resolved peak is processed by calculate parameters 105 (Fig. 7) in this embooiment, if 



25 classify extrema 102 indicated that fused peaks are being processed, prior peak 108 (Fig. 7) sets a prior 
peak flag Hence, adjust raw data prior peak test 104 4A (Fig. 15) checks the prior peak flag to determine 
whether the peak is preceded by a prior peak. If no prior peak exists, processing transfers to the trailing 
peak test 104 4C If a prior peak exists, the selected raw data is processed by remove prior peak 104 48 . The 
EMG parameters for the prior peak have been determined. Accordingly, using the EMG parameters and 

so each of the selected data points a tra^e f or the prtor peak is calculated and subtracted from the raw data. 

Specifically, using the EMG parameters for the prior peak, the EMG function G(t) (see equation 4) is 
evaluated at each time interval over the time range of the raw data. EMG function G(t) includes an integral 
that must be evaluated for each time DCint. However, in a preferred embodiment, a series expansion given 
in M. Abramowitz and t. A. Stegun. Editors, Handbook of Mathematical Functions . Appiiea Mathematical 

35 Series No. 55, National Bureau of Standaras. Washington D.C.. p. 932, (1964) ts used in Equation 4 to 
evaluate the integral. The series expansion for the integral is; 




2 

2 



dy - 1 



2 



2 



2 



( b 1 t + b^t 



2 




e(X) 



45 



(9) 
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t = 1 1 ^ |e(X) | < 7.5 x TO" 8 

1 ♦ pz 11 

p = C . 23 1 6^4 1 9 2 > 0 

b x = 0.319381530 b 4 = -1.821255978 

b 2 = -0.356563782 b 5 = 1.330274429 

b 3 = 1.781477937 

for 2 < 0 

— f e 2 dy = — e 2 (b-t + b-t 2 * b.t 3 ♦ b,t^ * b^t") - £ (X) 
/2« /2ti 

After the contribution of the prior peak is removed from the selected raw data, processing passes to 
trailing oo^k test 1 04 4C . Trailing peak test 104 4C tests the integer returned by classify extrema 102 to 
ascertain wtiether a trailing peak is present. If a trailing peak does not exist, the raw data is ready for further 
processing However, if a trailing exists, estimate EMG parameters 104 4D calculates the EMG parameters 
for tn-=r trailing peak. 

first, estimate EMG parameters 104 4D calculates signature ratio CRATiO. as previously described. 
Recai that signature ratio CRATIO is one of the peak parameters in the set of peak parameters containing 
characterizing information that was generated by the previous operations. Table 1 lists tfe number of the 
roA . m Column 1; signature ratio CRATIO in Column 2; and the skewness, expressed as r c. in Column 3. 
Column 3 through Column 5 are EMG peak definition parameters. As described below, interpolated peak 
def»nit»nri parameters, a second set of parameters, are used in combination with the set of peak parameters 
ccntam.ng characterizing information, a first set of parameters, to generate a set of characterizing 
parameters, a third set of parameters, for each peak. Lookup Table 1 induces a range of values for the 
signature ratio CRATIO and a set - peak definition parameters, t c,, sigma,, height,, rtime,. for each vaiue of 
signature ratio cratio,. 
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naively, if maximum index 102i determines that the value of the naex counter is greater than tne vaiue of 
extrema counter minus two, maximum index 102* transfers processing to peaK cente- cneck 102^. 

Peak center check 102*, examines tne value of the maex ccunter. If the vaiue of tne maex counter 
corresponds to the second extremum in the extrema array, processing passes to extrema check 102- 
5 However, if the index counter has a va:ue greater than tne location of the second extremum. processing 
passes to delete extrema 102 8 which in turn shifts the extrema array so that tne extreme co r resocnd ng to 
the value of index counter is the second extremum in the array. This operation deletes ali extrema in 
locations prior to the value of the index counter minus one because these extrema are noise and co not 
represent a chromatographic peaK. The vaiue of extrema counter is also updated to correspond to tne 
to number of extrema m the array after the notse is deleted. 

First extrema check 102? determines 0) whether the control passed directly from maximum index lG2i 
to peak center cneck 102&, i.e.. no processable peak was found, and (ii> whether the value of the extrema 
ccunier is less than four. If either of these conditions is true, reinitialize 102? reinitializes the extrema 
ccunter. the threshold flag and the array used to store the accurate positions of the extrema for the peak 
'S tx'or- gomg to return no Drocessable peak 102-c which returns control to the main program. 

However, if the extrema count is greater than four and a processable peak was detected, extrema check 
1C2- passes control to a second extrema check 102- • which in turn determines whetner the value of the 
c : «*t _ - - t j counter is greater than four, if the value of the extreme counter ;s four, the data in tns 6x trsms 
arra^ '^presents a resolved peak. Thus, extrema check 102*. transfers to return resolved peak 102-? which 
20 m tj^n rrtu'ns control to the main program. 

'* t^r value of the extrema counter is greater than four, second maximum index 102-2 sets the vatue of 
ttMf ir.jc* counter to the location of the third extremum in the extrema file. Peak identified 102*4 uses the 
p*!*«?*n ferorpttion rules based upon the traces in Figs. 6A-6D to analyze the extrema at the location 
correrfonrttng to the value of the index counter. Processing transfers between maximum index 102-3 and 
25 ex a* ^orttiprj 102- a to identify the largest extrema that satisfies the pattern recognition tests of peak 

?P»'-ificaiiy. if the magnitude of the curvature of the second extremum is less than the magnitude of the 
third pitfpnum multiplied by two and the magnitude of the third extremum is greater than the magnitude of 
rht> fourth ext-emum multiplied by two, the extrema are from highly fused peaks of different sign. If the 

30 fourth c<tremjm is greater than the third extremum or the fourth extremum is greater than the fifth 
extremum. the extrema are from highly fused peaks of the same sign. If the product of the sign of the fourth 
extremum anc the minmax array value for the fourth extremum are greater than zero, the extrema represent 
slightly -used peaks of different signs. If none of these criteria are satisfied, the extrema represent slightly 
fusee peaks of the same sign. 

35 Ir identify case 102- 5. the value of the index counter for the largest extremum. whicn satisfies one of 

the oanern recognition tests, is used to generate an integer identifying the fused peaks represented in the 
extrema file and return-fused peaKs 102- s provides this integer value to the main program. While in tnis 
embodiment empirical relationships were determined in pattern recognition 92 (Fig. 5A) ard incorporated in 
analyze data 93 as described above for classify extrema 102. in another emboaiment, a neural net is 

40 trained, using a procedure similar to that described beiow, in pattern recognition 92 to determine tne fused 
peak case and then the neural net wouid be used m classify extrema 102 instead of the empirical pattern 
recocmt on ru'es Further study is required to implement the neural net analyses. Ir addition, other pattern 
class ficaticn or recognition means known to those skilled in the art could be adapted for use in pattern 
recognton means 95 of this invention. For examples of other pattern classification or recognition means, 

45 see for example Massart et al., Chemometncs: a textbook, Elsevier. Amsterdam (1988). 

The identification of the characteristics of The chromatographic data using the extrema of the second 
time derivative provides an unique capability for analysis of peak data in general. Tne steps of: o> data 
characterization 90 (Fig. 5A) and attribute cefinition 91; (ii) development of pattern recognition means 95 for 
the selected attribute; (iii) analysis of raw data using multiple digital filters to locate and identify the second 

so time derivative extrema: and (iv) using the oattern recognition means to classify the extrema can be used in 



55 assumptions have been made aDOut the basenne. Rather, the raw data is analyzed and identified as being 
one of a number of possible combinations. Therefore, the results of the analysis are not biased by tne prior 
subtract'on of a oaseime from the data This method orovides the user with precise information about the 



23 




EP 0 395 481 A2 

without introducing a bias in the data. Pncr art methods such as those of Foley anc Dorsey. aescribed 
above, required not only a prior baseline correction, but also assumptions about tne relationsnips of the 
fused peaks. 

After the extrema have been classified Dy ctass;fy extrema 102. processing transfer? to measure 
5 extrema 103 (Ftg. 7;. In one embodiment, measure extrema 103 orecisely determines tne location of the 
zero crossings and the location of the maximum minimum of the peak. As used herein, the phrase 
"maximum minimum" means the maximum for a positive peak and the minimum fo- a negative peak. The 
precise location of these parameters is necessary, as explained beiow. to obtain good peak parameter 
estimates when a neural net is usea in calculate parameters 105. Also, if a lookup table is usea with a 

/o parameter such as signature ratio CRATIO. the accuracy of the value of CRATIO depends uoon the 
accuracy with which the 2ero crossings and the minimum maximum are determined. However, measure 
extrema 103 consumes a significant amount of computation time and therefore offsets the processing speed 
advantage of neural nets. In another embodiment, measure extrema 103 is not utilizea and processing 
transfers directly from determine case 102 to select data 104. 

75 The integer value provided by determine case 102 is used to identify the maximum minimum index 

location in the extrema array. This value is used to identify the index for the raw bunch data which is 
processed by measure extrema 103. Using the raw data, values of the curvature are calculated using Eg. 8. 
To find the zero crossings of the curvature, the data bunches that bracket the zero crossing, i.e., one data 
bunch having a positive value of the curvature and an adjacent second data bunch having a negative value 

20 of the curvature are found. After location of data bunches that bracket the zero crossing, three point 
polynomial interpolation is used to precisely estimate the location, i.e.. the time, of the zero crossings. 
Similarly, a three point polynomial interpolation is used to precisely locate the time and magnitude of the 
mm max of the curvature. Polynomial interpolation is well known to those skilled in the art. 

An important aspect of measure exirema 103 is tne data bunch size used in the calculation. In this 

25 embodiment, the size of the data bunches is adjusted so that there are ten data bunches between the zero 
crossings, i.e.. the inflection points of the peak. Empirical experience has demonstrated that about 10 to 20 
data bunches across the entire peak are optimal. If fewer data bunches are used, the accuracy of the 
processing is diminished. If more data bunches are used, the effect of noise is amDlified. 

The general framework for processing the data described above, which includes (i) multiple aigttai filters 

30 to determine the extrema of the second derivative, (ii) classification of the peak type by analysis of second 
derivative extrema. and (ni) measurement of the precise location of the secona derivative extrema provides 
information that may be processed m a number of ways so as to generate a set of characterizing peak 
parameters for each peak detected. 

Unlike prior art methods, this seguence of data processing is not dependent upon subtraction of a 

35 baseline from the raw data prior to processing. Thus, in contrast to the prior art methods, a wandering or 
indeterminate baseline does not limit the application of this general framework. Moreover this processing 
provides a set of peak parameters that contain characterizing information. Specifically, in one embodiment, 
the peak height, the time of the peak height, the peak width, as measured by the distance between zero 
crossings of the second time derivative, and peak skewness, as measured by the signature ratio CRATIO 

40 are known. 

In one emoodiment. as described with respect to Figs. 8A-8D above, fusee peaks are resolved into one 
or more resolved peaks by select aata 1 04 and each of the resolved peaks is processed in turn by calculate 
parameters 105. In another embodiment, select data 104 removes any prior peak, tor example, peak 114 in 
Fig. 8D. and then using the data from classify extrema 102 and measure extrema 103 defines a set of raw 

•is data corresponding to fused peaks, fcr example peaks 115 and 116 in Fig. 8D. for processing by calculate 
parameters 105. The prectse processing used by select data 104 is dependent upon the method used in 
calculate parameters 105 to generate a set of parameters for each peak. 

However, the general structure of select data 104 is independent of the process used in calculate 
parameters 1C5 and only the specific implementation of the general structure varies from application to 

so application. As shown in Fig. 14. select data 104 is a multistep process. The process includes the steps of: 




55 derivative ranges trcm about six to aoout twelve. Therefore, in one embodiment Dunch size 104- subtracts 
the first zero crossing of the curvature from the secona zero crossing of the curvature ana divides tne 
difference by twelve to select the data bunch size. 
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extrema counter is incremented so that the next externum whicn is found is storec ir tr.e appropriate 
location. 

Tne threshold counte r , which is also set to zero by initialize extremum variables 101 is used, as 
described more completely below, to determine when the raw catc remains at or oelow the threshold level 
5 for a selected period o1 time. The mmmax array is used in identification of tne extrena by classify extrema 
Briefly, as shown in Figs. 6A-6D, the minimum and maximum of the seconc time derivative oscillate 
between a negative and a positive value. For each extrerr.um after the first, the value siorec .n the mmmax 
array is the value in the mmmax array for the previous extremum multiplied by -1. After initialize extremum 
variables 101 processing transfers to save curvature 101 3 o which stores the magnitude and the sign of 
70 the curvature for the filter being processed. 

As previously described, if the curvature is not greater than the threshold for a selected period of time, 
processing transfers directly from threshold calculation 101-5 to save curvature lOlar without initialization of 
the extrema variables. 

If the curvature was greater than the threshold for at least two time intervals so thai initialize extremum 
rs variables 101-6 has set the tnreshold flag, processing passes from threshold check 101 - • to new extremum 
check 101 w rather than to curvature sign check 101 ; New extremum check 101-7 calculates the 
difference in the curvature from, calculate curvature 101 7 and the curvature stored in the extrema arrays at 
the current value of the extrema counter. This difference is multiplied by the value stored ,n the mmmax 
array at the current value of the extrema counter. If the resulting product is greater than the scaled 
20 threshold and if the stored curvature was generated by the same filter as the current filter or an adjacent 
filter, processing transfers to update extremum variables 101 « 8 . otherwise processing transfers to first 
extremum check 101^9. 

Update extremum variables 101 -s stores the current curvature from calculate curvature 101 7 in the 
extrema array at the position indicated by the value of the extrema counter. Similarly, the maex of the data 
25 bunch and the sign of the curvature are stored in the extrema bunch array and the sign array respectively. 
If the value stored in the minmax array at the location having the same index as the current value of the 
extrema counter is not equal to the s;gn of the curvature the data bunch index is stored in the zero 
crossing array. 

Hence, new extremum check 101. 7 determines (i) whether the current value of the curvature has 
30 exceeded the previously stored value of the curvature by a significant amount, i.e., the current value of the 
curvature is a better estimate of a second derivative extrema than the stored curvature and (m) whether the 
current curvature was generated by a filter witn a specified relationship to the filter that generated the stored 
curvature. If both these conditions are true, the current value of the curvature represents a better estimate of 
a second derivative extrema and hence the stored extrema estimate is updatec by upaate extremum 
35 variables 101 - s. 

First extremum check 101*9 ascertains whether the value of the extrema counter is greater than zero, 
indicating that one or more extrema have been located. If tne first extremum is being processed, i.e.. the 
extrema counter has the value zero, processing passes to post extremum check 101 27. otherwise 
processing passes to filter cneck i01:c Fitter check 101:: determines whether the current fitter is the same 

40 filter that detected the last stored extremum. If the current filter is not the same filter whicr detected the last 
stored extremum. processing passes tc post extremum cneck 101 2 ?. However, if tne filters are the same, 
processing transfers from fitter check 101 zo to curvature check 101 2 <. If the magnitude of the curvature is 
less than the threshold, curvature check 1 01 2 - transfers to increment thresnold counter 101:: which in turn 
increments the threshold counter. Conversely, processing transfers to imtia.ize trreshotd counter 1 0 1 2 3 

45 wherein the threshold counter is initialized to zero. 

Threshold counter check 101 2 * determines whether the curvature :or eight consecutive data bunches 
has remained below the threshold. If the value of the threshold counter is greater than eight, a baseline has 
been detected. As previously cescribed, the baseline is considered an extremum. and so an extremum nas 
been found. Thus, update extrema arrays-baseline found 101 25 stores the characteristics of the extremum 

c ~ corresponding tc the baseline Specifically, the extrema counte r is incremented, the sign of the curvature is 
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55 curvature: u'n the current filter is the same as the cne wrier aeiezxel tne storec curvature and (mi an 
extremum has been detected. If an extremum has not been aetectea. ana conditions ( j and (u) are true an 
extremum has been found, i.e . an extremum is found wnen the signal frcm the filter which generated the 
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update extrema arrays 1 01 r s and tne extremum found flag is set by extremum found 10W. 

If the three conditions are not satisfied, post extremum check 101:7 passes tc save curvature 1 0 1 3 c 
Save curvature 101 2z stores the curvature and the sign cf the curvature so that these va ues are available 
for the next filter. Finally, the filter window is increased by a factor of 2 by increase filter window 101 If 
t the new time window is less than the maximum, increase filter window 101 3- transfers to modulus check 
101 3. If the maximum time window nas been exceeded, processing transfers to tne main program from 
increase filter window 1 01 3 - - 

When processing transfers from locate extrema 101 (Fig. 7) back to the mam program and an 
extremum has been fcund, the mam program passes control to classify extrema 102. Locate extrema 101 
10 has generated an extrema file consisting of the arrays of extrema data The extreme file contains 
information for each extremum which has been detected and not yet identified as belonging to a peaK. For 
each extremum in the file the curvature, the sign of the curvature, the flag to indicate whether the extremum 
should be a positive or negative extremum, and the index of the data bunch for the extremum are stored. 

Baseline check 102- (Fig. 13) in classify extrema 102 establishes whether locate extrema 1 01 found a 
baseline. If a baseline has been detected, a first extrema counter check 102z ascertains whether the vaiue 
of the extrema counter is greater than or equal to four. If less than four extrema have been detected, 
sufficient data are not available to identify a peak, as previously explained. Thus, when less than four 
extrema have been detected, processing Uansfei'S to return no processable peak 102 ^ and subsequently 
back to the main program and the next data point is supplied to locate extrema 101 . 
20 If a baseline has not been found, baseline check 102- transfers to a second extrema counter check 
102: which in turn ascertains whether the value of the extrema counter is greater than or eaual to eight. If 
eight o r more e/trema are contained in the extrema file, at least one peak is contained in the data. 
Therefore, if the value of the extrema counter is greater than or equal to eight, extrema counter check 102^ 
transfers contro! tc maximum index check 102a. otnerwise control transfers to return no processable peak 

25 102- 

After sufficient extrema are located to identify at least one peak, classify extrema 102 must identify the 
pattern of the second curvature extrema and determine which of the second derivative traces in the 
minimum set have been detected by locate extrema 101 and stored in the extrema file. Pattern recognition 
means 95 (Fig. 5Ai for the extrema of the second derivative is incorporated in classify extrema 102. The 

30 empirical rules of pattern recognition means 95 are based upon empirical observations of the relationships 
between the second derivative extrema for a specific fused peak combination as well as between second 
derivative extrema for different fused peak combinations. 

Specifically, the first oeak m the extrema file should be centered at the second location m the extrema 
file so that processing starts on the second extremum. Thus, maximum index 102a initializes an index 

35 counter to the second location in the extrema file. Peak identification 102; uses a series of tests for each 
value of the index counter to determine where the center of the peak is located. These pattern recognition 
tests are based upon the characteristics of the traces shown in Figs. 6A-6D. For each extremum. the 
mmmay variable and the sign of the curvature should be the same so that tneir product shoula be a number 
greater than zero. Thus, the first test in peak identification 102b multiplies tne minmax variable and the sign 

40 of the extremum and determines whether the product is greater than zero. 

If the product of the minmax vanable and the sign of the curvature is greater than zero tor the seconc 
extremum. the peak is properly centered and processing transfers to peak center check 102^. If the procuci 
of the minmax and the sign of the curvature for the second extremum is less than zero, the extrema in the 
extrema array are sequentially prccessed to determine which extremum represents the minimum maximum. 

45 Specifically, after checking the second location in the extrema array, maximum inaex 102; increments the 
index counter and checks whether the inaex counter is less than the value of the extrema counter minus 
two. Assuming the check is satisfied, control again passes to peak identification' 102^. 

For each extremum after the second, the product of the minmax variable and tne sign of the curvature 
is also checked tc determine whether the product is positive. If the product is negative, control passes back 

so to maximum index 102^ which increments the index counter However, if the product is positive, an 



•j ',1'emun musi lo greater man tne magnr.uut 1 c: \nv .<ju'm ex.remuc. muiju'icC u } a^tptancc; o' 

55 the fourth extremum. :he fourth extremum must be greater than the third extremum or the fourth externum 
must be greate r than the fifth extremum For the fifth extremum, the product of the sign of the fourth 
extremum and tne minmax variable for the fourtn extremum must be greater than zero When one of these 
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After locate extrema 101 processes curvature G 0 (15) locate extrema 101 shifts to the next fiiter G (t: 
with a time wmaow 2t,. The current data bunch remains 17t, anc I7t, is not evenly aivisibie by 2t, so thai 
test fiv) returns processing to the main program, indicating that no extremum was fcuna. The reason for 
returning processing is that, in this embodiment, the leading edges or the filters are kep: aligned anc 
5 sufficient data is not yet available to define data bunch 161«. 

The main program again calls locate extrema 1 01 anc the data bunch index is pointing at the 
eighteenth data bunch 160- e. Since I8t, is evenly divisible by t,. data bunch 160«s is loaded m the data shift 
register for filter G-, (t) and curvature G 0 (16) is calculated. Arte' locate extrema 101 processes curvature G : 
(16), locate extrema shifts to the next filter G. (t) with a time wmaow 2t,. Since I8t is evenly divisible by 2t,. 
;o data bunch 161 5 ts loaded in the data shift register for filter G. ft) and curvature G, (7) is calculated. Notice 
that the forward time edge for curvature G, (7) is aligned witn the forward time edge for curvature G : (16). 

After locate extrema 101 processes curvature G- (7). locate extrema shifts to the next filter G c (t). 
However. 1 at, is not evenly divisible by 4t, so that filter G : (t> is not processed, and test (iv) returns 
processing to the main program indicating that no extremum was found 
*5 The main program again calls locate extrema 101 with the data ounch index pointing at the nineteenth 

data bunch 160 2 . Since I9t, is evenly divisible by t,, curvature G- (17) is calculated. However, I9t, is not 
evenly divisible by 2t t so that processing is again transferred back to the main program by test (iv). 

upun ;he subsequent call to locate extrema 101, the data bunch index is pointing at the twentieth rtata 
bunch :60_- Since 20t, is evenly divisible by l„ 2i, and 4t,. curvatures G : (18), G, (8) and G 2 (3) are each 
ro scquon-iatiy calculated by locate extrema 101. Again the forward time edge tor G ; (18) is aligned with the 
forward cdgG of the curvature calculated by filter G. (8) which is also aligned with the forward edge for 
curvature G^ (3). 

The purpose of the fourth test as demonstrated in the above example is to Keep the filters used in 
locate extrema 101 properly aligned so that the leading edge of each fitter remains aligned. This alignment 

25 resu'ts n the most reliable means for locating curvature extrema of the data. However, extrema can also be 
located by maintaining alignment of the centers of each filter used. Thus, according to the principle of this 
invention a portion of each digital filter is aligned with a portion of each of the other digitals so as to locate 
and identity extrema of the second derivative, as explained below. Moreover, as previously explained, the 
second time derivative is an attribute of the chromatogram and the extrema of the second time derivative 

30 are characteristics of that attribute. Thus, the digital filters as described herein, determine characteristics ot 
an attribute o* measured data. 

The data buncning and the time windows used for curvature filter G (t) are important aspects of this 
invention If only a very small time window, i.e. narrow data bunches, was used, any noise on the 
chromatographic trace would be effectively amplified and tne analysis of the data would be difficult. 

35 Simnarly. if only a very broad window was utilized, information would be effectively averaged over the 
window and information may be lost. Accordingly, curvature filter G (t) is used sequentially with an 
increasing window size, as illustrated in Fig. 11, and the curvature calculated by adjacent filters is compared 
to identify extrema of the curvature. 

For example, m Fig. 11, a calculated curvature for five data bunches in line 161 is compared with a 

40 curvatu-e calculated for five data bunches in line 160 and with a curvature calculated for five data bunches 
in line 162. The output signal from each of the filters, the curvature, is scaled as described below so that 
the filter output signals can be compared directly. Specifically, measures of the curvatures produced by the 
different filters are compared. 

Two comparisons of the curvature measures are made. A first comparison determines whether the 

45 measure calculated for tne group of data bunches being processed by the Mte r is a better estimate of an 
extremum of the curvature than the prior best estimate of the extremum of the curvature which has been 
stored A second comparison determines whether the stored best estimate of an extremum represents an 
extremum of the curvature. For example, in Figs. 3A and 3C as the filters process the data corresponding to 
peak 60 the stored best estimate of first extremum 62< increases as the data bunches being processed 

so move towards tne first extremum but when the calculated curvature mcves past the first externum the 
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Tnis operation corresponds to initialing the data shift register, described previously, for each filter. This 
initialization is not an essential aspect of the invention, Dut. as described below, storing data usea m 
Eauation 8 minimizes data input and outout to locate extrema 101 
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no extremum was found by initialize return value 101. Time window 101? sets the first fitter time window to 
the minimum bunch size. Modulus check 101 3 determines whether the index for tne current data buncn is 
an integer multiple of the time window for the current filter, as described aoove. If the data Dunch is not an 
integer multiple of the time window, processing returns tc the main program. Otherwise, the data bunches 
5 for the last four terms on the right hand side of Equation 8 are shifted one position to the left so that G,(t-1 ) 
become G,(t-2). G/t) becomes G,(t-1>. G,(t+1) becomes G,(tj and G,(t + 2) becomes G.a + ij by adjust data 
bunches 101 4. This operation prepares the current filter data shift register for updating with the most recent 
data bunch. This manipulation of the terms used in tne filter eliminates retrieving all five terms from an 
external storage device each time locate exlrema 101 calculates a curvature. New data bunch 101 5 
10 retrieves a new cata buncn which is assigned to G,(t + 2) (See Equation (8)). 

After the terms in G (t) are defined, the index for the data bunch being processed is checked by initiate 
filter processing 10U to determine whether the data bunch is the tnird or greater data bunch for the current 
filter. If the data bunch is the first or second data bunch for the filter, processing returns to the main 
program. In this case, unless an extremum was found, as described below, tne main program increments 
75 the data bunch index to the next data bunch and then returns to locate extrema 101. 

If the data bunch is the third or greater data bunch for the filter, all the terms in G (t) are defined. Thus, 
calculate curvature 101 7 determines the value of G (t). 

After calculate curvature 101y. characterize curvaturs 101s determines the sign of the curvature and the 
magnitude of the curvature. The magnitude of the curvature is processed by scale curvature 1 01 3 . 
20 Scale curvature 101= scales the curvature magnitude so that the magnitude of the output signal for 

each filter is the same for the same average curvature. Recall that the curvature filter time window for each 
filter is a power of two times the minimum bunch size. The scale factor is 2 i6pr2 where p = 1n? (working 
bunch sizet,) and t, is the minimum bunch size. 

In subsequent steps, described more completely below, the scaled curvature is used with the sign of 
25 the curvature to identify extrema of the second time derivative. After calculation of a scaled magnitude by 
scale curvature 101 = . extremum founa 101— checks the status of the extremum found flag. If an extremum 
has been found, control passes to save curvature 101ac Otherwise, a curvature threshold flag is checked 
by threshold check 101*- to determine whether tne curvature has moved above the threshold for a selected 
time period. 

30 If the curvature threshold flag is zero, the sign of the curvature for the previous data bunch is compared 
with the sign of the curvature for the current data bunch by curvature sign check 101 t- If curvature sign 
check 1 01 - c detects a difference in the sign of the curvature between adjacent data buncnes. a sign 
counter is initialized by initialize sign counter 101-3 and processing transfers to save curvature I0i3c which 
stores the magnitude and the sign of the curvature for the current filter. 

35 If curvature sign check 101- c!oes not detect a change in the sign of the curvature between adjacent 

data bunches, the sign counter is incremented by increment sign counter 101 -a. After increment sign 
counter 101-i, threshold calculation 101-5 compares the magnitude of the curvature with the threshold. The 
magnitude of the curvature for the previous data bunch is also compared with the threshold. If both the 
current and the previous curvatures are greater than the threshold, and the value of the sign counter is 

40 greater than two, processing transfers to initialize extremum variables 101 ^e. otherwise processing transfers 
to save curvature 101 3: 

In initialize extremum variables 101 • ^ . several counters and other variables are initialized because the 
raw data has risen above the tnreshold level for a specified period of time and so the process of 
determining the location of the extremum must be initialized. Specifically, an extrema counter and a 

45 threshold counter are set to zero. The curvature threshold flag is set to indicate that the curvature has 
exceeded the threshold for a specified time period. The initial position in the zero crossing array and in the 
extrema bunch array are set to the index for the current data bunch. The zero crossing array is used to 
store the indices of the data bunches when the curvature passes through zero. Similarly, tne extrema bunch 
array is used to store the indices of data bunches corresponding to the extrema of the curvature. The initial 

so position in the extrema curvature array is set to the magnitude of the curvature because at this point, the 

Tne fcxiremci ouunitjr . w"nC'i <"d5 Set X Z6<C Z< > initialize £- X * ' 6 m U iT. VS'iSDiCC 10? . C LiGC^ to ;CCf"tiI> 

55 the storage position in the externa arrays for the extrema of the second tine derivative fourd by locate 
extrema 101. The first extremum is located at the zero position in the extrema arrays, tne second extremum 
at the first position and so forth. After an extremum is found and data characterizing the extremum are 
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Ir cne embodiment, cnromatographic data is obtained using a Sccctra-Pnysics gradient iiautc 
chromatography pump sold under Spectra-Physics part No. SP88O0-01 The pump is usea with a 
chromatographic coiumn anc a Spectra-Physics liquid chromatography UV-visibie aetectcr with single nual 
wavelength ana scanning modes, ratio output, timed wavelength changes, and automatic fraction collector 
5 advance and sold unaer part No SP8490-010. The cetector is connected tc Soectra-Physics computing 
integrator moael No. SP4270, Part No. SP4270-0T0. The computing integrator performs the A. D conversion 
and data sample generation, explained previously, and is connected to a microcomputer having WINner 
workstation software pacKage, suppnec by Spectra-Physics under part No. WINne'-OTO, instailec anc 
operating. 

70 The specific Spectra-Physics components and software described herein are not essentia to this 

invention and are provided only to indicate commercially available apparatus tnat may be used to generate 
data for processing according to the principles of this invention. These instruments are described in a 
publication entitled "Chromatography Instruments and Systems," available from Spectra-Physics. Auto Lab 
Division. 3333 North First Street, San Jose, California. 95134. 

75 The above Spectra-Physics automated chromatography system provices (i) compressed data and (n) 

method files which specify data needed for integration and generation of summary reports from the 
compressed raw data. Typically, the compressed raw data are stored in data bunches having a width of 
about 0.1 seconds. 

Since tne automated system generates compressed raw data, prior to processing the data according to 

20 the principles of this invention the compressed raw data are retrieved from the hard disk Dy preprocessing 
software and processed to form a series of contiguous data bunches with each data bunch having tne same 
time interval. The time interval of the data bunch is referred to as a time window. Each data bunch has a 
single index to identify the bunch. 

In addition to defining the total number of data bunches, the number of data samples in a bunch, and 

25 the time width of the minimum bunch size, the preprocessing software also requires the user to specify the 
threshold value and the number of filters, sometimes called detectors, to be used. The preprocessing 
software is dependent upon the measurement apparatus and therefore t< orobably different for different 
apparatus. The important aspect of any preprocessing package is that the package provides the data 
initialization items described above. 

30 While this invention may be implemented with hardware to process the digitized time series data 

representing the chromatogram. in one embodiment the processing is performed by a computer program in 
microcomputer 153 (Fig. 10A). The computer program was written in the C programming language. 
However, the specific language used is unimportant and any high level computer language or even 
assembly language for a specific microprocessor could be used to implement the principles of this 

35 invention as described herein. 

The five steps in the peak characterization process, as illustrated in Fig. 7. are included within a mam 
program which controls the input data and the overall processing. As previously described, the time series 
of digital data is initially processed by locate extrema 101 (Fig. 7). However, prior to initiation of locate 
extrema 101. several variables, which are used in the analysts, are initialized m the main program 

4j Specifically, a threshold level is obtained from the preprocessing routines A preprocessing oackage is not 
required to provide the initialization data for the main program. The initialization data, in another embodi- 
ment is coded directly in the main program. 

When the digitized data has a value less than the threshold level for a selected penoa ot time, a 
baseline is defined. The threshold level is not a fixed number which can be usee for aii analyses of 

45 chromatographic data, but rather the threshold must be determined for each series of chromatographic 
measurements obtained with a specified experimental apparatus. 

Typically, a level portion of a chromatogram may be used to define the threshold ievel as previously 
described. Alternatively, a threshold level is defined by passing an eluent through a chromatographic 
column, a blank run, and quantifying the detector output signal for the biank run. Other means for aefming a 

^ threshold level are known to these skilled in the art 
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55 completely below, is passed tc classify extrema 102. 

As previously described according to the principles of this invention, locate extrena 101 uses a series 
o f digital filters to identify extrema of the second time derivative of the agitized data Eacn of the digital 
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constant number of aata points on either side of the selected aata point. 

Conceptually, a shift register notds the data points ror each filter. The time span of adjacent shift 
registers differ by a factor of two. After calculation of the curvature for tne data in a shift register, a new aata 
point is shiftec into the shift register, and the earliest data point in time is shifted out of the shift register. 
5 Tne emboo:men: of the digital filters described herein is rot intended to limit the scope of the invention 

but rather is illustrative only of the principles of the invention. In one embodiment, the digita filter was: 
G <t> = 2*g(t-2, - g(t-t> - 2'g(t) - g(t + 1) + 2"g(t + 2) (8) 

Digital filter G (X) requires rive data bunches, e.g., data from a five bunch shift register, to generate 
curvature G (h for point t. As useo herein, the indices t-2. t-1. t. t + 1 anc t + 2. are integers usee to denote 

io data bunches. Each data bunch has a corresponding time wmcow The subscript i is used to denote the 
filter. For i = 0. the filter time window has the smallest size and in one embodiment, for i = 6 the filter time 
window has tne largest size wnicn is sixty-four times greater than the smallest time window Tne time 
windows of aojacent filters differ by a power of two. In one embodiment, the number of filters used is 
provided by the data preprocessing package, e.g.. the number is user selected. 

is ' F»g 11 illustrates the relationship between digital filters G (t) used to laentify extrema of the curvature 
according to tne cnncipies of this invention. The data bunches for filters G : it), G, (t), G : (t) are represented 
oy the dashed ires 160. 161. 162. The length of each segment m a line, for example segments 16CK I6O2. 

ISO: of iin t 160. are ihe same and each segment represents the time size of the data bunch for the 

filter The time size of the data bunch for the filter is a characteristic interval for the filter. The length of the 

20 segments in £0|a:ent lines, for exampie 160- in line 160 and segment 1 61 - in line 161. differ by a factor of 
two In fact tn-: ciata bunch for segment 161- is obtained Dy adding data bunches 160- and 160? together. 
The oth^r dat.-. nineties m line 161 are obtained from the data bunches in line 160. and similarly the data 
bunrnps in line 162 a r e obtained from the data bunches in hne 161. 

In this errcoc ment. the minimum time window size is t, and the maximum window size (not shown) is 

25 64"t, tor a total of seven different filter sizes. In Fig. 10, only the data bunches for the first three filters G ; (t), 
G. (t; G : (I) are represented. The first filter has a time window width t,. The second filter window has a 
width of 2*t. and the tmrd a width of 4*t P . Thus, each filter has a different characteristic interval and the 
characteristic inter /a!s differ by an integer multiple. 

In one entcaiment. minimum time window t, for line 160 is defined by the preprocessing package. 

30 Typically, a system such as that m Fig. 10A uses a default of about 0.1 second for the minimum bunch 
size. If 60 Hz is jseo m A D converter 151 (Fig. 10A) so that each data sample is 16 2 3 milliseconds, the 
number of data samples is about 6 m the 0.1 second minimum bunch size. This embodiment is illustrative 
only and is not ^tended to limit the scope of the invention. The important aspect is that the peak 
processing system of this invention is provided the minimum time widtn for the data which is being 

35 processed by the system. 

For each set of data, as represented by lines 160, 161, 162 <Fig. 11), the curvature is not calculated 
until filter G (W is centered on the third data bunch G (3) because there are not sufficient data bunches to 
calculate curvature G (t) until the third data bunch. Since chromatographic data scans typically have a 
baseline preceding the first oeak. this limitation shoulc not result in the loss of any information. However, 

so with other filters or other applications of the curvature, average values or seme other representation for two 
data bunches to the left of the initial data Dunch, 160- , 161. 162- could be used so as to determine the 
curvature for each of the measurea data buncnes. Similarly, the curvature is not calculated for the last two 
data Dunches n raw data 100 (Fig. 7). As used herein, tne phrase "calculate the curvature for each data 
point m a data set" means calculating the curvature for the data points for which the curvature is defined. 

45 i.e.. all the data points except the first two and last two in tne data set. 

As describee more completely beiow. locate extrema 101 first calculates curvature G ft), and then 
sequentially G (t> to G. : ft). The processing by locate extrema 101 is terminated when <i> all the filters have 
processed the current data bunch and no extremum was found; (n) an extremum is located; (i'i) a filter is 
encountered for which sufficient data bunches are not available to use the filter; or uvj the index of the 

*n current data bunc^ rimes the minimum bunch size is not evenly divisible by the time window for the current 
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55 Assume that the data bunch index is currently pointing at the seventeenth data bunch 160-7 ''Fig 11) m the 
raw data file. Since 1 7t. divided by t, is 17. the current aata bunch 160-7 is eveniy divisible by the time 
window for filter q ^) Hence, the current data bunch is loaded in the jata shift register for filter G : tt) and 
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parameters for each resolved peaK. A specific embodiment using the signature rat.e CRATIO is oesrnoec 
below. Neural network. Simplex analyses or other curve fitting methods may be used individually Or in 
combination as described more completely below. 

Independent of the method used to calculate parameters that characterize a peak, calculate parameters 

d 105 js fundamentally different from the prior art methods described aoove. First, data including a basenne 
signal is processec by calculate parameters 105 to generate a set of characterizing parameter and a 
baseline estimate. In contrast, the prior art methods previously discussed estimated a baseline, corrected 
the raw data with the baseline estimate and than used the baseline correctec data tc estjmate a set of 
characterizing parameters. Thus, these prior art methods used a prior oaseline correction whiie in my 

w invention, tne set of peak characterizing parameters and a baseline estimate are determined together Thus, 
any errors introduces by a prior baseline estimate are eliminated, anc a best fft to the date for a peak witn a 
baseline is determined- 

tn a first embodiment, after identification of data representing the chromatographic DeaK by select data 
104, calculate parameters 105 uses an iterative curve fitting process to adjust the EMG parameters until an 
»3 optimal fit to the data is found. The optimality of the fit is measured by subtract-ng the currently estimated 
peak line shaoe, sometimes called the peak trace, plus oaseline estimate from the selected raw data and 
summing the squares of the differences between the estimated peak line shape plus baseline estimate and 
mc selected raw data over the seieuiuu raw data reyiun. 

In one embodiment, the baseline is estimated at each of the iterations by selecting the baseline to 
trsuro that the error function is zero at the beginning and ending points of the selected raw data region. The 
base me is linearly interpolated between the start and end points. 

In another embodiment, tne baseline is determined by finding the straight line fitting the difference of 
t^c raw data and the currently estimated peak line shape by the minimum least square criterion. The 
pa'ameters describing the line y = a + b"x are found by using: 

rs 

Ix . Ix . y . - ix . x . ly . 

a IX. IX . - IX . X . 11 . 

II 111 

*> (7) 

Ix .y . 11 . - ix . ly . 
_ i J i i i J i 

IX . X . 1 1 - Ix . IX . 

Ill 11 



where 

x, = the integer index for the ith data point 

y t = raw data value for ith data point minus candidate line shape value for the ith data point. 
The summations in Equation 7 extend over the range of data identified by se.ee! data 104 In these 
embodiments, the baseline is determined as the parameters describing the peak are ascertained, unlike 
pnor art methods, described above, whicn subtract the baseline from the raw da;a and then deduce the 
parameters describing the peak. 

In another embodiment, ca.cuiate parameters 105 generates curvature data using the data provided by 
select data 104. The curvature cata are input signa;s to a neu-ai net. The neural net. which has been 
previously taught to determine EMG peak parameters based upon curvature input signals, generates output 
signals corresponding to the EMG peak parameters best describing the input signals. The neural net 
processing is more than thirty Times faster than the iterative approach describee above. After generation of 
the EMG peak parameters by the neural net, in one embodiment a Simplex fit process uses the neural net 
EMG peak parameters as an initial estimate, as described below, to obtain EMG parameters that accurately 
reproduce the raw data 
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Arter calculate parameters \Qb aeterm nes an estimate:: se: c* oa r a^etera "j- r : r ^: ^eaK cj r rentr y oem- 
analyzed, the extrema for mat peak are removed from the extrema file and then fused peak 107 f^ig 7i 
tests to determine whether classify extrema 102 indicated that the current peak was Dan of a fused peak 
^ec.pnrp \ { ^pp p c as was n o* oa^ o*' a fusee pea* seauence r^ccess'nc ret , .."'n c f r aw data 100 which 
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Alternatively, if the peak being analyzed by calculate parameters was part of a fused peaK sequence, 
fused peak 107 passes control to prior peak 108. Pnor peak 108 sets a flag indicating that a prior peak was 
analyzed and then retjrrs control tc raw data 100. The prior peak flag is used in select data 104 to 
determine whether a prior peak exists and the information from classify extrema 102 is used by select data 
5 104 to ascertain whether a following peak is included in the sequence The process, illustrated in Fig 7, is 
repeated until all of the data in raw data file 100 have been analyzea. 

Tne peak Drocessmg system of Fig. 7 is incorporated, in one embodiment, in an automated chromatog- 
raphic data processing system. The system, illustrated in Fig. 10A. includes a chromatographic detector 
150. an integrator 154. and a computer 153. Integrator 154 inctuaes an analog-to-digital converter 151 and a 
to processing unit 155. Analog-to-cigitai converter 151 includes voltage-to-frequency converter 151. a counter 
151;, and a timer 151 3. Computer 153 mcluaes a display terminal 153- . a central processing unit "153-. 
keyboard 153;., and a storage medium 152. 

In a typical chromatographic analysis, a sample 142 is introduced to chromatographic column 144 
through control valve 143. After the sample introduction, control valve 143 is positioned so that an eiuent 
T 5 provided from eiuent storage tank 140 by pump 141 is passed through column 144. As described 
previously, the eiuent causes the components of the sample to move through column 144 at varying rates. 
Detector 150 generates an output signal that varies with time. i.e.. a time varying voitage. as the 
components in the sample rTnyidte through detector 150. 

The detector output signal is processed by a voltage-to-frequency converter 151*. Converter 151- is an 
20 oscillator which pulses at a rate proportional to the voltage signal from detector 150. Counter 151: counts 
the number of pulses from converter 151- for a period of time which is selected by timer 1513. The output 
signal from counter 151 2 is the number of counts m the selected time interval. 

The output of converter 151- is processed by processing unit 155 to form data bunches, as described 
below, and the data bunches are passed to CPU 153^ which in turn stores the data on storage medium 152 
25 such as a computer floppy disk, a computer hard disk, a computer tape, or perhaps a puncned paper tape 
or some other means for saving the data so that the data can be recovered and analyzed. 

Selection and operation of detector 150 integrator 154 and computer 153 as well as chromatographic 
column 144 are known to those skilled in the art. For example, as describea above, detector 150 may be a 
fluorescence detector, an absorbance detector, or a detector which measures refractive index. 
30 Fig. 10B illustrates in more detail the conversion of the detector voltage versus time signal to a data 

sample which is stored on storage medium 152. Each data sample starts at a time T s and ends at a time 
T enri and the number o' oscillations of voltage-to-frequency converter 151 • is counted over this period and is 
represented by the vertical area between T s and T end in Fig. 10B. In one embodiment, the duration of the 
data sample, i.e.. the time interval between T s and T end . is equal to the period of the power line frequency. 
35 Thus at 60 Hz. the aata sampie is for 15 2 3 milliseconds while at 50 Hz the data samDie is for 20 
milliseconds. Data samples S- S;. S3. S^ are contiguous as shown in Fig. 10B. A selected number of data 
samples, for example, samples S* . S:. S3. Si are summed to form a data bunch B- for the selected time 
interval. 

Hence, m a preferred embodiment, the data processed according to the principles of this invention 
40 consists of a time series of contiguous cats bunches with each data bunch having an area wmch is related 
to the material passing through the chromatographic column during that period of time. As explained more 
completely below, the data samnies are summed to form cata bunches, because tms allows improvement in 
the signal-to-noise ratio. 

More importantly, data bunching is used to optimize the data sample size for characterization of 
45 individual chromatographic peaks. For example, the optimal data bunch size for characterization of a one- 
second duration peak obtained using a gas chromatographic capillary column is not the same as the 
optimal data bunch size for an amino acid peak which may be as much as five minutes wide. According to 
the principles of this invention, the numoer of aata bunches between the inflection points of a chromatog- 
raphic peak is a constant irrespective of the peak width. This constant is in tne range of about 6-12 data 
c " bunches because empirical results have indicated that the best estimate of a set of parameters which 
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55 the number of filters usee m locate extrema 101 (Fig. 7), and <vi» time base information including a) whether 
the AD conversion was based upon 50 Hz or 60 Hz. b) whether the retention times ere in minutes or 
seconds and o a scale factor to convert the indices the data to a time. These six items are collectively 
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control to raw data 100. If an extremum was found by locate extrema 101. the second funcnon of extrema 
found 106 is to examine the extrema file and determine tne number of extrema :na: have been aetectec I* 
less than four extrema have been detected, sufficient data are no: available to identify a peak, because as 
shown in Figs. 3C and 6A-6D the minimum number of extrema required to identify a beak is tour 

b Accordingly, when less tnan four extrema have been detected, control is passed back tc ravs aata *0C 
which supplies the next da:a Dunch to locate extrema 101. Conversely, if a baseline ann four or more 
extrema have been found extrema found 106 passes the extrema file to classify extrema 102 

If a baseline has not been founc, extrema found 106 passes the extrema file ro classify extrema 102 
when eight or more extrema have been found, because even if the raw aata signal has not re:urnec to 

io baseline level, at least one peak must be included within the time range of the eight extrema In one 
embodiment, as described below, extrema found 106 is included within classify extrema 102 

In this embodiment, when either a baseline and at leas: four ex:rema or more than eight externa are 
contained in the extrema file, the extrema are processed by classify extrema 102. and classified as extrema 
representing either a resolved peak, slightly fused peaks of the same sign, slightly fused peaks of opposite 

75 sign, strongly fused peaks of the same sign, or strongly fused peaks of opposite sign Classify extrema 102 
classifies the characteristics of the second derivative extrema in the extrema file using the pattern 
recognition ru.es generated as described above. The specific rules are discussed more completely below. 

tn some prior art analyses, a second derivative of a enromatocjram was used to identify the onset cf 3 
peak, and to detect the exis:ence of peak shoulders. However, according to the principles of this invention, 

20 specific characteristics of the second derivative are used to classify peaks m a chromatogram as one of the 
peaks or combinations of peaks in the minimum set rather than to identify specific features of the peaks. 
Hence, more information is obtained about the chromatogram than in the prior art methods previously 
described. 

After classify extrema 102 classifies tne peak or peaks represented by the data in the extrema file. 

25 measure extrema 103 calculates the time and magnitude of the second derivative extrema for tne identified 
peak or peaks. In a first embodiment which uses a neural net tn calculate parameters 105. the precise 
location and magnitude of the second derivative extrema are an important aspect of the invention because 
the accuracy of subsequent processing of the data by the neural net is dependent upon the accuracy of the 
location and magnitude of the extrema. However, in another embodiment, subsequent processing is less 

3o sensitive to the exact location and magnitude of the extrema. In the first mentioned embodiment, a 
polynomial fit is used in measure extrema 103. but in the other embodiment, measure extrema 103 may 
possibly be eliminated. The important aspect is that a set of peak parameters containing characterizing 
information is provided for further analysts. This set of peak parameters includes the height of the peak 
crest, the time of the peak crest, the peak width, and a measure of the peak skewness. As explained more 

35 completely below, the set of peak parameters containing characterizing information is used to generate a 
charactenzmg set of parameters for each enromatographic peak. 

After completion of measure extrema 103. select data 104 uses the information from classify extrema 
102 and measure extrema 103 to prepare raw data 100 for processing by calculate parameters 105. The 
function of select data 104 is (i) to select a range of raw data 100 for processing dv calculate parameters 

40 105 basea upon the classification of the data in the extrema file by classify extrema 102. and (n) to remove 
contributions to the selected raw data from peaks other than the peak or peaks that are suoseauently 
characterized by calculate parameters 105. 

To illustrate the effect of classify extrema 102 upon select data unit 104 four afferent cases, illustrated 
in Fig. 8A through 8D. are considered, in the first case, peaK 109 (Fig. 3Aj is resolvec. In the seconc case. 

45 as illustrated ;n Fig 8B, two peaks 110, 111 overlap and first peak 110 is being analyzed. In the third case 
(Fig. 8C), two peaks 110, 111 again overlap but second peak 111 is being analyzed. In the fourth case (Fig. 
8D), three peaks 1 1 4, 115. 116 overlap and middle peak 115 is being analyzed, In this embodiment, 
calculate parameters 105 processes only data corresponding to a resolved peak Thus, seiect aata 104 
removes contributions from any adjacent peaks to the oeak being analyzed. 

sn While four cases are shown in Figs 8A through 8D. these cases are illustrat've only of the principles of 
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data unit 104 must seauentialiy aefme the time range cf the aa:a corresponding to ea^n of tne peaks m 
Figs. 8A-8D. In the first case (Fig 6A), peak 109 is a resolved peak, and select aata 104 determines tne 
time range of data r equired by calculate parameters 105 and supplies only that cata tc calculate parameters 
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being analyzed, so tnat oniy data for a single peak remains. 

in the second case (Fig. 8B), the snaDe of second peak 1 1 1 is estimated by select data 104 using the 
measured extrema for second peak 111 Specifically, a signature ratio CRATIO is formec tor second peak 
111. The signature ratio CRATIO is: 

t 

CRATIO = (6) 
c 



where t w (Fig. 9) is the time difference cetween the time of the second extrema of the seconc derivative 
and tne time of the second zero crossing and t c is the difference in time between the two zero crossings of 
the second derivative. 

The time t c between the two zero crossings is a measure of the peak width at the inflection points of the 
D-Eak Time t w is the time from the peak crest to the trailing inflection point, which corresponds to the 
second zero crossing. Signature ratic CRATIO is a measure of the asymmetry of the peak. Consequently. 
SKjnaiure ratio CRATIO untauely defines the skewness of an EMG peak. Since signature ratio is defined 
using rr>=> time of the peak inflection Deaks and the time of the maximum peak displacement, i.e. information 
atxxi- tre peak crest. :he signal-to-noise ratio for the data used to define the signature ratio is usually not 
ah^-ctod by baseline level noise. 

Signature ratio CRATIO is first used to estimate peak skewness. Peak skewness is used, as described 
m-yre completely below, to provide an initial estimate of the peak width, the peak amplitude and tne peak 
area These estimated EMG peak parameters are used to calculate a trace for second peak 111 (Fig. 8C). 
i e t^e estimated EMG parameters are used in Equation 4 to generate an EMG curve as a function of time. 
The est mated trace for second peak in is subtracted from chromatographic data 100 so that corrected 
ra* data corresponding to first chromatographic peak 110 is obtained. The data representing only first 
chromatographic peak 110 is passed to calculate parameters 105 which in turn determines the EMG 
parameters describing second peak 111. 

in tne third case (Fig. 8C), the second peak 111 of a pair of fused peaks 110. 111 is being analyzed. 
Accordingly, using the procedure described for case two (Fig. 8B). the EMG parameters aescribing first 
peak 110 have been ascertained by calculate parameters 105. Accordingly, the determined EMG param- 
eters for peak. 110 are used to define a trace of peak 110. This trace is subtracted from the raw data for 
fused peaks 11C. 111 so that corrected raw data corresponding to second chromatographic peak 111 ts 
obtained. The corrected data representing only second chromatographic peak 111 is passed to calculate 
parameters 105 which in turn dete-mmes the EMG parameters describing second peak 111. 

In the fourth case (Fig. 8D), where tne peak being analyzed 115 is overlapped by a prior peak 114 and 
a tr^img peak 116. a combination cf the second and third procedures, described above, is used. In this 
case, prtui peak 114 has been characterized and a trace for peak 114 is generated using the determined 
parameters as described for the thiro case above. This trace is subtracted from the raw data so that data tor 
second and third peaks 115. 116 remains The signature ratio, defined above in the second case, is used to 
determine a trace for third peak 116. the trailing peak. The trace for peak 116 is subtracted from the raw 
data for second ana third peaks 115, 116 to leave raw data corresponding to second chromatographic peak 
115. Using these four cases, various combinations of fused chromatographic peaks car be successfully 
analyzed 

In the above description, combinations of fused peaks were separated into resolved peaks by select 
data 104. In another embodiment, select data 104 first determines whether a prior peak has oeen analyzed 
and. if so. select data 104 subtracts the contribution of the prior peak from any trailing peak or peaks. 
Calculate Darameters 105 then analyzes either the remaining peak or the next pair of fused peaks. 
Accordingly, in this embodiment the fused peaks are not separated into a series of resolved peaks but 
rather pairs of fused peaks are analyzed, as described more completely beiow. by calculate parameters 
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characterize tne aetected peak. 

The processing usee ir, calculate parameters 105 is broken into three genera categories (1) lookup 
tables. (2) neural network analysis, and (3) cu-ve fitting analysis, e.g.. Simplex fitting analysis. Lookup taole 
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used in this embodiment. In view o f this disclosure, other selected attributes and characteristics of the 
selected attributes can be usee with the pnncioles of this invention to characterize data havirg the selected 
attribute Accordingly, the use of the second derivative and the extrema of the second derivative are 
illustrative onlv of the principles of this invention and are not intended to urn t tne sccpe of the invention 
b Trie first step m data characterization 90 is to analyze measured chromatographic data to ascertain the 

types of peaks cenerally observed. For example, as described above in measured chromatographic data 
trie types of the peaks include gaussian, EMG and generai shaDes Further, each type of peak may be 
either a positive peak or a negative peak. For each type o* peak, a set of character. zing parameters, as 
previously described, is defined. Usually, this step can oe accomplished by examining the prior art literature 
;u lor the measurements of interest. Alternatively, measured data having resoived peaks could be digitized and 
ihe resolved peaks statistically analyzed to define the types of peaks in tne data and tne parameters 
'*=ctssary to characterize the peaks. 

The second step in data characterization 90 uses the different peaks identified in the first step to define 
d -Tuni-num set of peaks including resolved peaks and combinations of resolved peaks so that each peak in 
t a irvomatcgram can be classified by pattern recognition means 95. The important aspect of this step is to 
a first resolved peak and then identify the possible combinations of resolved peaks with this peaK. I 
i'?:^*crc-3 that if chromatographic data is processed sequentially forward in time, each fused peak 
sequ^^vrc »n a enromatogram can be represented as sequential ps;rs of fused peaks. For example, as 
,ij;i-a!0 m Figs. 19A to 19E. a fused peak sequence containing four peaks 302-305 can be analyzed by 
?c an, identifying (i) peaks 302, 303 (Fig. 19B), (ii) peaks 303, 304 (Fig. 19C) and (iii) peaks 304. 305 

if t92i The actual analysis is described below 

t*h, possbic combinations of pairs of resolved peaks include (i) a positive resoivea peak with either 
a-o^cf positive resoived peak or a negative resolved peak, or (ii) a negative resolved peak with eitner 
a-ofc negative resolved peak or a positive resolved peak. Each combination of resolved peaks creates a 
25 tusc-c [.-at- pair In addition to the combinations cf peaks, the degree of overlap, usually referred to as 
tustcr, o* me combinations of peaks must be defined. In a chromatogram, the peaks may be either slightly 
(usee iP.g 19D) or strongly fused (Fig. 19B). The degree of fusion between peaks i.e. slightly or sttongiy 
fusee na: m ist be considerea is determined by the attributes used m pattern recognition, as discussea 
be >o* 

30 Tmus tor chromatograms. the minimum set mciudes (i) resoived peaks, (n) slightly fused peaks of the 

sam^ s»gn (in) slightly fused peaks of opposite sign, (iv) sirongly fused peaks of the same sign, and (vj 
strongly fuscc peaks of opposite sign. 

Hence, data characterization 90 first determines from measured data the characteristics of the data of 
interest ie. the types of peaks. Then a minimum set of resolved peaks and combinations of resolved 

35 peaks required to seauentially identify fused peak sequences in a chromatogram are determined. 

After tne measured data are characterized by data characterization 90, attribute definition 91 (Fig. 5A) 
detcrm.res a selected attribute of each peak and each fused peak combination in the minimum set. In one 
embodiment, the selected attribute is the second derivative with respect to time and so the second 
derivative for each r esolved peak and each fused peak pair in the minimum set from data characterization 
90 ts calculated Use of tne second time derivative as the selected attribute places a hnrvtation on the data 
that can be processed according to the principles of this invention. The characteristics (peaks) o f the data 
that are analyzed must have a continuous second time derivative Other pattern recognition means could be 
□eve oped using other attributes of the measured data oy one skilled m the art in view of this disclosure. 
For chromatograms. attribute definition 91 determines the second time aenvative for each resoivea 

45 peak and each fused peak combination in tne minimum set of peaks. For example, as m Fig. 3C. the 
second derivative 62 with respect to time of positive resolved peak 60 starts to rise until the signal reaches 
a first positive maximum, a first extrema 62*. After the signal reaches the first positive maximum 62- 
second derivative 62 cecreases passing through zero. This is a first zero crossing 62: of second derivative 
62 and corresoonds to a first inflection point of peak 60. After first zero crossing 62:, second derivative 62 

5f ccntnues to cecrease until second derivative 62 reaches a maximum minimum, a second extrema 62j. 



;rcases passing r: r, jjy: c beLjuu <:t?r l. ^j--' : »»~ zr zz-"ezuz~zz - z^zi r "z •".ta^sc ; ~~ ■ -j- ~ - 
60. unti derivative 52 reaches a second positive maximum, tne third e<trema 62-- After second derivative 
62 passes through second positive maximum 62e , the signal decreases Aner second derivative 62 goes to 
zero for a selected oenod of time, a fourth externa 52 ; for peak 60 is defined which corresponds to the end 
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Fig. 6A illustrates the second time derivative 70 of slightly fused peaKs of the same sigr. The secona 
derivative has eignt extrema 70^70e. Fig. 6B illustrates the second time derivative 71 for slightly fused 
peaks of opposite sign, which has seven extrema 71- -71?. F:gs. 6C and 6D illustrate the second time 
derivative 72 tor sfongly fused peaks of the same Sign with five extrema 72—72; and strongly fused pea«s 
5 73 of opposite signs with five extrema 73- -73^ respectively. The waveforms in Fig 3C anc Figs. 6A-6D 
represent calculated attributes of a resolved EMG peak and of combinations of pairs of EMG peaks. 

In pattern recognition 92 (Fig. 5A>, the characteristics of each second derivative from attnoute definition 
91 are analyzed, and a means for uniquely identifying each of the second derivatives is aszertainec. In one 
embodiment of pattern recognition 92, an empirical set of rules for identifying each of the second time 
70 derivative traces was generated, as described more completely betow with respect to classify extrema 102 
(Fig. 7). These rules define relationships between the second derivative extrema for the various grouos of 
peaks in the minimum set and thereby provides a means for identification of each of the second derivative 
traces. The features of the second derivative are used to distinguish between fused peaks, i.e.. sligntly and 
strongly fused, in the minimum set of peaks. Specifically, the unique cnaractenstics of the second derivative 
is of slightly fused oeaks and strongly fused peaks are sufficient to uniquely identify the various degrees of of 
peak overlap that are encountered in chromatcgrams. 

However, pattern recognition 92 (Fig. 5A) is not limited to a set of rules. A processing system, such as 
a neural neiwoik simi iar to those described more completely below, could be trainsd using second 
derivative extrema to identify each of the resolved peaks and combinations of resolved peaks in the 
20 minimum set. Alternatively, a neural net and a set of rules, similar to those described above, might be used. 
In this embodiment the neural net is trained to generate a set of output signals, based upon input signals 
such as eithe' the second time derivative or characteristics, e.g. extrema. extracted from the second time 
derivative. The neural net output signals are processed by the set of rules so as to identify the peak or peak 
combination represented Dy the input signals to the neural net. 
25 The second derivative pattern recognition means 95 (Fig. 5A) generated by pattern recognition 92 

provides, as described more completely below, a means for classification of measured data which does not 
require a prior baseline determination. While in this embodiment, the curvature for each resolved peak and 
each combination of resolved peaks in the minimum set has been used as the selected attribute for pattern 
recognition, in another embodiment another attribute can be selected. 
30 With another attribute, iterations may oe required between select attribute 91 and pattern recognition 

92. For example, on a first iteration a selected attriDute could be usee in select attribute 91 and if a pattern 
recognition means cannot be successfully generated by pattern recognition means 92. a new selected 
attribute or a second selected attribute in combination with the first selected attribute must be generated by 
select attribute 91. This process continues until pattern recognition 92 produces a means for uniquely 
35 identifying the presence of each of the peaks ano combinations of peaks in the minimum set. 

Analyze data 93 and parameter set 94. as illustrated in Fig. 5B. are further subdivided, in one 
embodiment, into a five-step method that is used to generate a set of parameters which characterize each 
chromatographic peak detected. The five steps are: (1) locate extrema 101: (2) classify extrema 102: (3) 
measure extrema 103: (4) select data 104; and (5) calculate parameters 105. The first and second steps are 
jo included in analyze data 93 while the fourth and fifth steps are included in parameter set 94 As explained 
more completely below, measure extrema 103 is required for certain embodiments of calculate parameters 
105 and can be considered as a part of either analyze data 93 or parameter set 94. 

The pnnciples of this invention are illustrated using chromatographic data which includes, for example, 
gaussian and exponentially modified gaussian (EMG) peaks. However, the method is applicable to any data 
45 having peaks with a characteristic shaoe. Accordingly, the principles of this invention may be used, for 
example, to determine a set of parameters which characterize general chromatographic peaks, peaks 
associated with neutron activation analysis, or capillary zone electrophoresis peaks. 

As illustrated in Fig. 7, digitized chromatographic data 100 are processed forward in time by locate 
extrema 101. Locate extrema 101 uses a series of digital filters, as described below, to calculate the 
so characteristics of the second time derivative of data 100. Locate extrema 101 generates a data set, called 



55 program. Alternatively, the data set can be stored on any storage aevice mat is accession by locate 
extrema 101 and by the subseauent operations that use the stored aata. 

Extrema found 106 performs two functions in this embodiment. The f;rst function is to determine 
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Figure 1A illustrates a typical chromatographic trace. 

Figure 1B illustrates a prior art baseline connection for a resoived peaK. 

Figure 2 illustrates a baseline corrected chromatographic peak ana oarameters used to characterize 
the corrected peaK 

t Figures 3A, 33, and 3C illustrate an exponentially modified gaussian (EMG; chromatographic oeak, 

the first derivative of the EMG peak, and the second derivative of the EMG peak, respectively 

Figure 4 illustrates tne parameters used in one prior art metnoa to characterize two fused chromatog- 
raphic peaks. 

Figures 5A and 5B are a general block diagram illustrating the metnod for processing chromatog- 
10 raphic data according to the principles of this invention. 

Figures 6A-6D illustrate the second time derivative for fusee peak combinations encountered in 
chromatographic measurements. 

Figure 7 is a more detailed block diagram illustrating one embodiment of a method for processing 
chromatographic data according to the principles of thts invention. 
75 Figures 8A-8D illustrate resolved and fused peaks that are typical of enromatographic data. 

Figure G illustrates the characteristics of the second time derivative of a positive resolved EMG peak 
which are used to calculate the signature ratio according to the principles of this invention. 

Figure 1 0A is an illustration of one embodiment ui an automateo chromatographic system suitable for 
generating raw data 100 which is processed according to the principles of this invention. 
20 Figure 10B illustrates data samples and data bunches as used tn this invention 

Figure 11 is an illustration o' the data filter in locate extrema 101 according to tne principles of this 
invention. 

Figure 12 is a detailed block diagram illustrating on embodiment of locate extrema 101 according to 
the principles of this invention. 
25 Figure 13 is a block diagram of determine case 102 according to the principles of this invention. 

Figure 14 is a biock diagram of select data 104 of this invention. 

Figure 15 illustrates one embodiment of adjust raw oata 104* of this invention. 

Figure 16 illustrates a neural ne: as implemented in this invention. 

Figure 17 is a flow diagram for the Simplex fit used in tnis invention. 
30 Figure 18 is a graphical representation of the Simplex fit used in this invention. 

Figure 19 illustrates one example of a portion of a chromatogram which is analyzed according to the 
principles of this invention 

Figure 20 is a flow diagram for one emoodiment of calculate parameters 105 according to the 
principles of this invention. 

35 

DETAILED DESCRIPTION 



Unlike onor art methods for characterization of chromatograms wnich required prior determination of a 
baseline, the method and apparatus of this invention determine the retention time, peak width, peak area. 

40 and peak s<ewness for chromatographic data having resolved peaks, slightly fuses peaks of the same sign, 
slightly fused peaks of opposite sign, strongly fused peaks of tne same sign, and strongly fused peaks of 
opposite sign without a prior determination of a baseline. The retention time, peak width, peak area and 
peak skewness for each chromatographic peak, a set of characterizing parameters for a peak, and the 
baseline signal level for the peak are determined together. 

45 The simultaneous determination of the set of characterizing parameters for a chromatographic peak and 
the baseline for that peak eliminates the baseline correction prior to characterization of the data as in the 
pnor art methods described above. Further, the above prior art methods used signals having about the 
same magnitude as the baseline signal to define the baseline Hence, the signal-to-noise ratio in the 
baseline determination was very small and accordingly the correction of the data tor the baseline 
contributor was often difficult As explained more completely below, in the system and method of this 



a r e net er.cjunte r ec 

55 Moreover, according to tne principles o* this invention, extrema of the curvature of tne crromaicgrachic 

signal a r e used to identify each peak in a enromatogram The curvature further reauces the noise prooiems 
associatec with a wardering basehne because if tne baseline signal is constant or the base'ine signal 
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contribution from the baseline so tnat tne start and end of a oeak are easily determined from tne curvature 
signal. Further, curvature extrema for both positive and negative peaks are determined in tne same manner 
so that, in contrast to prior art methods described above, characterization cf negative peaks with a 
wandering baseline is fully equivalent to characterization of positive peaks, 
f As used herein, the "sign" of a peak, positive or negative, is determined by tne peak amplitude if the 

peak amplitude has a positive maximum, the peak is a positive peak. Conversely, if tne peak amplitude has 
a negative minimum, the peak is a negative peak. In general terms for both positive and negative peaKs. the 
phrases "maximum minimum amplitude" and "peak crest" are used hereinafter to characterize the maxi- 
mum height, i.e. displacement, of tne peak. 

ro According to the principles of this invention, characterization of peaks in a chromatogram is a two-step 

process in a first step, data corresponding to each peak or each pair of peaks in the chromatogram is 
identified. As explained more completely below, a uniaue filter apparatus locates extrema of the curvature of 
the chromatographic data and a file is generated containing characteristics of the extrema. A pattern 
recognition apparatus analyzes the characteristics of the located extrema and classifies the peak or peak 

is combination represented by the data in the file as one peak or one peak combination in a set of resolved 
peaks and selected combinations of resolved peaks. A portion of the chromatographic data, which 
corresponds to the peak or peak combination identified by the pattern recognition apparatus, is then 
selected for further analysis. This portion of the data includes both the s:gna! for the oeak and the signal fnr 
the baseline upon which the peak is superimposed. 

?o In the second step, data for a peak or a peak combination identified as described above, or in the 

alternative identified by some other process, is processed and a set of characterizing parameters for the 
peak or the first peak in the peak combination is generated without a prior baseline correction to tne data. 
As explained more completely below, the peak data including the baseline level uoon whicn the peak is 
superimposed is analyzed using one of iooKup tables, neural nets, curve fitting, and combinations of lookup 

25 tables, neural nets and curve fitting. Each of these characterization processes, using information about the 
peak crest and the peak inflection points, determines a set of characterizing parameters and a baseline 
estimate that best fit the identifiec data. Thus, the Deak characterization according to tne principles of this 
invention is not biased by a prior baseline correction. 

The determination of a set of characterizing parameters and a baseline estimate that best fit the 

30 identified cata is fundamentally different from the prior art methods described above. In those methods, a 
three step process was used to determine a set of characterizing parameters. The three steos were (0 
estimate a baseline, (iij correct the raw data using the estimated baseiine and (mi estimate a set of 
characterizing parameters for the baseline corrected data. Hence in this prior art process, a prior baseline 
determination was used and as previously described the prior baseline determination was often complicated 

35 by a wondering or an indeterminate oaseline and a poor signal-to-noise ratio. In contrast, according to the 
principles of this invention, a baseline and a characterizing set of parameters are aetermtned simulta- 
neously i.e.. in the same processing step, so that a prior baseline determination is not necessary. Thus, 
any errors introduced by a prior baseline estimate are eliminated, and a set of parameters, which 
characterize the peak, and a baseline are estimated to achieve the oest fit to the measured data. 

jo The process of this invention ts illustrated as a flow diagram in Figs, 5A and 5B. As previously 

mentioned, the process of this invention is a two-step process and this is illustrated in Fig. 5E as analyze 
data 93. and parameterize cata 94. Analyze data 93 includes a pattern recognition means for classification 
of peaks in a chromatogram. The process used to develop the pattern recognition means is illustrated in 
Fig. 5A. 

45 Briefly, the process used to develop the pattern recognition means includes three steps. The first step, 

data characterization 90. defines the peaks and combinations of peaks that must oe recognized The 
second step, attribute definition 91, determines the attributes for each of the peaks and combinations of 
peaks identified by data characterization 90 that are subsequently used in development of pattern 
recognition means 95. Finally, the third step, pattern recognition 92, analyzes the attributes generateo by 

so attribute definition 91 and generates pattern recognition means 95. As described more completely below, 



55 having peaks with a characteristic shape. Thus, principles of this invention may be used, tor example, to 
analyze peaks associated with neutron activation analysis or capillary zone electrophoresis. 

Further, as desenbed more completely below, a selected attribute, the second derivative of the 
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process In a firs; step, data corresponding to each peak cr each pair of peaks in the chromatogram is 
identified. A unique filter apparatus locates extreme of the curvature of the chromatographic data ana 
characteristics of trie extrema are stored for furtner processing. A pattern recognition apparatus analyzes 
the stored characteristics of the extrema and classifies the the stored data as represent ng ore peak cr one 
peak combination m a set of resolved peaks and selected combinations of resolved peaks. A portion cf tne 
chromatograpmc data, which corresponds to the peak or peak combination identified by the pattern 
recognition apparatus, is then selected for further analysts. This portion of the raw data includes both the 
signal for the Deak and the signal for the baseime upon which the peak is superimposed. 

In the second step, data for a peak or a peak combination identified as described above, or in the 
alternative, identified by some other process, is processed and a set of characterizing parameters for the 
peak or the first peak m the peak combination is generated without a prior baseline correction :o the data. 
The peak data including tne baseline level upon which the peak is superimposed is analyzed using one of 
lookup tables neural nets, curve fitting, and combinations of lookup tables, neural nets and curve fitting 
Each of these cnaracterization processes, using information about the peak crest and the peak inflection 
pants, determines a set of characterizing parameters and a baseline estimate that best fit the identified 
data Thus, the peak characterization according to the principles of this invention is not biased by a prior 
baseline correction 

iii yno ^Miuvjginioi n, 1 1 id piyucco wi ii no n ivci mui ' »o u _>i_ ^> hi on ouiumaigu uoiq oy Oici i i vwink_ii ii iwiuuco a 

detector for proviamg a time varying output signal. The output signal includes a plurality of peaks 
supenmoosed or a oaseline. The detector output signal is converted to a time series of digitized data by an 
anaiog-to-digttal converter and the digitized data is stored for further processing. 

The digitizec data is retrieved from the storage means and a second time derivative of the digitized 
data is generated As the second time derivative of the digitized data is generated, the extrema of the 
second time cenvatiue are located. The characteristics of each extremum are stored for further processing. 

Afte' each extremum is stored, the number of stored extrema is tested to determine whether sufficient 
extrema are available for identification of a peak by a pattern recognition means. When either at least four 
extrema are storad one of which is a baseline, or at least eight extrema and no baseline are stored, the 
pattern recognition neans processes the stored extrema and identifies the extrema as representing either (i) 
one peak in a set o : resolved peaks or (ii) one peak combination in a set fused peak combinations. In one 
embodiment, the pattern recognition means uses a set of empirical rules to classify the stored extrema as 
representing a peak or peak combination in the five groups of peaks including (i) resoived peaks, (ii) slightly 
fused peaks of tne same sign, (iih slightly fused peaks of opposite sign, Uv) strongly fused peaks of the 
same sign, and (v» strongly fused peaks of opposite sign. 

The process used to identify each of the peaks in a measured chromatogram, represents a significant 
advancement in the analysis of chromatograms. In the onor art methods of Foiey and Dorsey. assumptions 
were maae about the fused peak sequences and empirical relattonshios used to identify the characteristics 
of the fused peaks According to the principles cf tms invention, each peak m the measured data is 
identified so that only data for that peak can be selected from the raw data file and subseauently analyzed. 
The identification process provides the user with a powerful tool *or analysis of chromatograms, because 
the identification orocess is not affected by a wandering or changing baseline. 

After identification of the measured Deak or measured peak comomation, the identification in conjunc- 
tion with the extrema for the peak or peak combinations are input to a data selection means wnicn in turn 
selects a range of raw data, i.e., a portion of the measured data which includes the measured Deak or peak 
combination. The automated system tnen calculates a characterizing set of parameters for the measured 
peak using only the selected portion of the raw data. Unlike the prior art methods oescriDed above, the raw 
data is not baseline corrected prior to processing by the oeak characterizing means. Rathe", the peak 
characterizing means of this invention calculates a set of characterizing parameters and a oaseline estimate 
for the peak using raw data tha: includes the signal level associated with the peak and the baseline signal 
level. 

Severa 1 alternative embodiments are described for calculating the characterizing set of parameters and 



set o- characteri ng parameters oased upon tne seconc time derivative of the selected raw data which is 
providec as input signals to tne neura. network. In these embodiments, a peak shape is generated using the 
se: o : characterizing parameters and the Deak shaDe is subtractec from the raw data to form the baseline 
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the baseline, and then determined the characterizing parameters, in these embodiments the baseline 
estimate is done only after the characterizing parameters are ascertained 

In yet another embodiment, a curve fitting procedure is used to cetermme the best fit of a set of oeak 
characterizing parameters to the selected raw data. In the curve-fitting process, the best set of charactenz- 
5 ing parameters for the selected raw data is iteratively determined in combination with a least squares 
estimate of the baseline. 

Combinations of the lookup table, neural network and curve fitting are also used to generate a set of 
characterizing parameters for each peak. In another embodiment, a pair of neural networks a r e used to 
generate characterizing parameters for each resolved peak and a Simplex fit is used to generate a set of 

to parameters for each peak in fused peak seauences. 

Another feature of thrs invention includes a method for generating the pattern recognition means used in 
identification of the peak or peak combination described above. In tnis method, specified data types, for 
example peaks, in measured data are identified and each specified data type is characterized. For example, 
in chromatograms the peaks may be of a gaussian. exponentially modified gaussian or general shape. 

;s Moreover, the peaks in a chromatogram are often fused. Thus, each peak in the fused peak sequence must 
be classified as one of the specified data types so that a characterizing set of parameters can be obtained 
for each peak. 
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processing the fused peak sequence as pairs of peaks and classifying each fused pair of peaks. The 

20 classification of pairs of fused peaks first requires Determination of the possible combinations of pairs of a 
specified data type, for example EMG peaks. From all possible combinations of pairs of EMG peaks, in one 
embodiment, the minimum number N of combinations of pairs of EMG peaks required to reproduce the 
peak pair combinations in measured data are empirically determined. In this embodiment, the N combina- 
tion of pairs of EMG peaks are fij slightly fused peaks of the same sign. <ii) slightly fused peaKs of opposite 

25 sign, (iii) highly fusee peaks of the same sign, and (iv) highly fused peaks of opposite sign. 

After determination of the specified data type and the minimum N combinations of pairs of the specified 
data type, the next step in the generation of pattern recognition means is to find an attribute which can be 
calculated for the specified data type and each of the N combinations of pairs of the specified data type. 
Preferably, the attribute is a waveform which has different characteristics for the specified data type and 

30 each of the N combinations of the specified data type. For example, when the specified data type was 
resolved EMG peaks, the attribute was the second derivative with respect to time and the different 
characteristics of the second derivative with respect to time were the extrema of the second time derivative. 

After the attribute and the unique characteristics of the attribute for the specified data type and each of 
the minimum N combinations are ascertained, the final step in generation of a pattern recognition means is 

35 developing a means for uniquely identifying the characteristics of the attribute for each of the specified dcta 
types and each of the N combinations. This means comprises the pattern recognition means because the 
means for uniquely identifying tne characteristics of the attribute can be used to classify measured 
characteristics of the attribute. For example, in one embodiment, an empirical set ot rules was generatea to 
classify extrema of the second time derivative of measured chromatographic data. 

40 Another feature of this invention includes a methoc and apparatus tor locating extrema. i.e.. characteris- 

tics, of the second derivative, i.e.. an attribute, of data bunches having a characteristic interval. Typically, 
the characteristic interval is a selected period of time, sometimes called a time width or time window. Tne 
method and apparatus includes at least two filters which are used in locating the characteristics of the 
second derivative. The first filter generates a curvature value tor data bunches having the characteristic 

45 interval. A second data set is formed with each aata bunch in the second set having a second characteristic 
interval that is an integer multiple of the first-mentioned characteristic interval. The second filter generates a 
curvature value for data bunches having the second characteristic interval. The firs: and second filters 
sequentially calculate a curvature for each data bunch in tne first and second data set respectively. 

To locate the extrema of the second derivative, the calculation of the curvature by the first and second 

50 filters is sequenced so that the filters are maintained in alignment with respect to the data bunches being 
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non-gaussian peaks. Fig. 3A illustrates an EMG pean. Fig. 3B illustrates the slope, first derivative with 
respect to time, of the EMG peak and Fig. 3C illustrates the curvature, the second derivative with resDect to 
time, of the EMG peak. The EMG peak ir F:g. 3A is a oositive resolved peaK Decause the peak has a 
positive maximum. However, negative resolved peaks, i.e., peaks having a negative minimum are also 

e encountered in chromatograms 

Chromatographic peaks are often fused as shown in Fig. 1A The analysis of fusea peaks is mjch less 
straightforwarc and the results are dependent upon the means oefmec to separate tne peans as well as the 
baseline correction. In one method, (see, for example, Spectra Physics, "SP4270 Computing Integrator 
Operator's Manual, Section Seven - Principles of Integration." 1982. Spectra Physics Part No. A 0099-110). 

to the areas are allocated by dropping a perpendicular iine from the valley separating peaks to the interpolated 
baseline. In an alternative approach, the peaks are "skimmed" by taking baseline references at one or more 
valleys. The vertical drop method and skimming method are very inaccurate in the-r allocation of area 
between fused peaks. Errors m excess of 30% typically occur in area allocations for fusee peaks using 
either of these methods. In these methods, the slope and or the curvature of the detector signal have been 

75 used to identify the peak maximums and the valley between the maximum of two fused peaks. 

Foley m "Systematic Errors m the Measurement of Peak Areas and Peak Height for Overlapping 
Peaks." J. of Chromatography. 384, 301-313 (1987) suggested methods of estimating EMG line shape 
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corrected peaks. The parameters used by these investigators, as illustrated in Fig. 4, are apparent peak 
20 heights h p - and h p , 2 . valley height, h v , and peak widths a.b. These parameters are defined after the peaks 
have been baseline corrected. Therefore, as described below, while the method of Foley is better than the 
vertical drop and skimming methods described above, the method is iimited by the accuracy of the 
baseline determination. 

Foley suggests that a logical approach for quantitation of fused peaks is to develop a quantitative 
25 method based upon measurements m regions of minimum distortion. He further suggests that for two fused 
peaks there is much less distortion for the first peak than the second, as evidenced by the generally 
insignificant errors m peak height for the first peak. Foley derived the folic wing empirical equation for the 
area of a peak: 

A = 1.64 h p W 0 75 (ba) 0 - 717 (5) 

30 where A is the peak area. h p is tne peak height and W 0 7 5 and (o a) are the peak width and asymmetry 
measured at 75% of the peak heignt respectively. 

Foley defines the relative valley as the ratio, expressed as a percentage, of the valley height h v to the 
apparent height h p of the peak in question. The investigator reported that the bias of Eauatior (5) is less 
than * 1.1% for well-resolved tailed (EMG) peaks having to in the range of 0-4.2 ana for well resolved 

35 symmetric (gaussian) peaks. For overlapping EMG peaks, or for an EMG peak overlapped by a gaussian 
peak with area ratios between 1 A and 4:1. empirical eouation (5) is reported to be accurate to ±4% for the 
first peak provided that the relative valley between peaks is less than 45%. For a symmetric peak 
overlapped by an EMG peak, empirical equation (5) is accurate to within ±2% for the secona peak if the 
relative valley is less than 50% For overlapping peaks with ratios outside the 1:4 and 4:1 range, equation 

40 (5> is described as being somewnat more accurate but only for the larger oeak of the overlapped pair. 
Hence, not only is this method Iimited by the accuracy of the baseline correction, but a;so the method is 
limited to EMG peaks having specified relationships. 

Equation (5) is used to auantitate only one peak of an overlapped oair. But if an integrator is used to 
measure the total area of tne two overlapping peaks the area of the remaining peak is determined by 

45 subtraction of the calculated area from the total measured area. 

All of the prior art methods described above, including the work of Foley and Dorsey are dependent 
upon an accurate determination of the baseline. Inaccuracies in the baseline determination can cause 
significant errors in the derived properties for a peak. Since in practical chromatography a wandering 
baseiine and large overlapping peaks atop a wandering and indeterminate baseline are frequently encoun- 

50 tered. quantitation of either resolved or fused peaks using the Dnor art methods, described above, is often 



Hence the methods desenbec above are not sutabie tc analysts of enromatcgrapmc data withou: 
basenne resolution. In general terms, identification of mdivicuai crromatograonic peaks in d enromatogram 
is a pattern recognition problem because the chromatogram typically consists of one or more peaks of a 
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pattern of resolved and fused peaks in the chromatcgram. A secona problem is determining a set of 
characterizing parameters for each identifiec peak. In otner areas of science, metnoas nave been developed 
for pattern recognition. For example, a neural net, sometimes cailed a neiral network, has been taugnt to 
complete numerical secuences based upon training tne net with other numerical seouences 

5 A neural net includes input units, internal units and output units. The input unts are a first layer or the 

neural net. The internal units may be configured in one or more layers and the output units are tne final 
layer in the net. Eacn of the input units supplies a signal to each of the internal units in the layer of the net 
adjacent to the input units. If the neural net has more than one layer of internal units, each unit in the first 
layer of internal units, i.e.. the internal units receiving signals from the input units, generates an output 

10 signal that is provided to each internal unit in the second layer of internal units. Other layers of internal units 
are connected to adjacent layers of internal units in a similar manner. Each internal unit m the layer of 
internal units adjacent to the layer of output units provides a signal to each output unit. Each output unit 
provides an output stgnal. 

For each neural net. the problem is to develop the internal units of the net so that for a aiven input 
is pattern tie net generates the appropriate output pattern. This requires that the net be trained to recognize 
thp input patterns and generate the corresponding output patterns. For £ specific example of configuring a 
neura rv=t as an exclusive OR gate see D. E. Rumelhart. "Learning Internal Representations By Error 
Propagator " parallel distributive processing: "Explorations of the Micro Structures cf Cognition. Volume V" 
D. E. Rumoiha-t and J. L. McClelland (eds.). Cambridge, M.A.. MIT Press, pp. 318-362. 
20 While pattern recognition methods, such as neural nets, are known, to the best of my knowledge such 

metnoas have not been used for either identification of chromatographic peaks in a enromatogram or 
characterization of identified chromatographic peaks. Accordingly, the prior an methods, as desenbed 
above, for identification and characterization are limited by the requirement for a prior baseline determina- 
tion which can bias the data. Moreover, the prior determination of a baseline for a fused peak sequence or a 
25 negative peak wnen the baseline is drifting is proolematic. In addition to a prior baseline determination, the 
previous y desenbeo prior art methods for analysis of fused peaks sequences require specific relationships 
between tne peaks and an empirical relationship for evaluation of the peak area. Thus, a method and 
apparatus for analyzing a chromatogram without a prior baseline determination is needed to overcome the 
prior art iimitat ons. 

30 

SUMMARY OF THE INVENTION 

Unlike Dncr art metnods for characterization of chromatograms whicn required prior determination of a 

35 baseline, tne method and apparatus of this invention determine the retention time, peak width, peak area, 
and peak Skegness for chromatographic data having resoivec peaks, slightly fused peaks of the same sign, 
slightly fused peaks of opposite sign, strongly fuseo peaks of the same sign, and strongly fused peaks of 
opposite sign witnout a prior determination of a baseline. The retention time, peak width, peak area and 
peak skewness for each chromatographic peak, a set of cnaractenzing parameters for a peaK. and the 

40 baseime signal level for the peak are determined together. 

The set o ; cnaractenzing parameters for a chromatographic peak and a baseline for that peak are 
determined simultaneously, i.e., m the same process step. Accordingly, the baseline correction prior to 
characterization of the data, as in the prior art methods described above has been eliminated. Furtner. the 
above prior art methods usee signals having about the same magnitude as the baseline sicna to define the 

45 baseline. The signai-to-noise ratio in the oasehne determination was small and the correction of tne data for 
the baseline contribution was often difficult. In the system and method of this invention, information about 
the peak crest is used to estimate the peak shape ano in turn the characterizing set of parameters for the 
peak and tne baseline. Since the signai-to-noise ratio is typically the largest about the peak crest, tne prior 
art problems associateo with the small signal-to-noise ratio in baseline corrections are not encountered. 

so According to the principles of this invention, extrema of the curvature of the cnromatograpnic signal are 
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curvature extrema for ooth positive and negative peaks are determined n tne same manner so that, in 
contrast to prior art methods described above, characterization of negative peaks witn a wandering baseline 
is fully equivalent to characterization of positive peaks. 
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aetector signal as a function of time. The levei of the cetector signal means tne actual signal from tne 
detector The slope of the detector signal means the first derivative with respect tc time cf the ae:ector 
signal ( f r s t derivative) and the curvature means the second derivative witn respect to time o: tne ae:ecto r 
signal (seconc derivative). 

t To characterize a resolved peak 1C. 20 (Fig. 1A), when one or mere of tnese parameters fthe ievel 

slope of curvature of the detection signal) exceeds a threshoia level, a frst baseime reference point, as 
described below, is usually established at a predetermined time prior to the time wnen the thresnoid level 
was exceeded. The threshold level is typically determined Dy examination of a chromatogram (The 
thieshold level is generally selected as tne ievel corresponding to tne baseline signal level e.g.. signal level 

jo 50- (Fig 1A).) After establishment of ths first baseline reference point, the aetector signal is continuously 
integrated until the parameter being monitored falls below the threshold level for a selected period of time 
W ion the parameter falls below the threshold level for the selected period of time, a second baseline 
reifcicnce point is established. A straight line is generally interpolatec between the first baseline refe r ence 
pcmi and the second baseline reference point. 

rs Tig Ifj dljstrates a common application of this method. In Ftg 1B. a single resolved peak 10 is shown 

wi:r» oaseime signal levels 50:. 502- A first threshold level is represented Dy dashed line 50=. At point a. 
which occurs at time t a . tne detector signal level exceeds threshold level 50s A firs: baseline refe r ence 
pemt t> is generally established ai time t^, which is ct predetermined t.me prior tc time t a . The detector 
signal level is continuously integrated until time t«. Time te is a selected period after tne detector signal level 

20 faiic below a second threshold level 50s at point c. A second oaselme reference point e is defined at time t e . 
In this sample, the chromatogram of Fig. 1A was analyzed to establish the two different threshold levels 
used m tne base ine correction. 

Tnus the integrated signal, i.e.. the total area under curve 10 between points tt> and u, includes the area 
ecrresponemg to the baseline signal. To obtain only the peak area, the area corresponding to the baseline 

25 signal must be subtracted from the total area. The problem is to accurately estimate the area corresponding 
to the baseline signal In one case, the peak area is found by subtracting the area of the trapezoid [the 
shaded area t0- in Fig. 1B] defined by (t b ,0). (t b ,t b ), (U.i e ). (U,0) where t b and t e are the detection signals 
at time t c ann t e respectively. Here the terms within parentnesis are x. y coordinates with the x coordinate 
being wr\e and the y coordinate being detector output signal. Peak 10 after subtraction of area 10- is 

30 illustrated m Fig. 2 as peak 10 . The peak characteristics, i.e., peak height, width, skewness and the time of 
maxima m nmal signal, i.e.. the retention time, are determined using the baseline corrected data of Fig. 2 
as described below. 

Thus peak integration is generally done by (i) detecting a point o* departure from baseline, i.e.. 
detecting trie time at which one or more of the detector signal level, the slope or curvature exceed a 
35 threshold level, (n) establishing a first baseline reference point at some predefined time prior to the point of 
departure from baseline; and (iii) continuously integrating the chromatograpnic signal from tne first baseline 
reference point until the signal again drops below the threshold level for a predetermined time, i.e., the 
second baseline reference point. 

Conventional chromatographic peak quantitation apparatus require that oarameters be set which control 
40 analysis of the detector signal, e.g., the baselme-thresholc level, and sometimes small changes in the 
control settings or detector input signals can cause very large changes in the peak quantitation determina- 
tion. Thus, present methods for analysis of chromatographic data are unstable. 

An alternative to the above method for correcting the measured data for the baseline signal level is to 
obtain a blank chromatogram (a enromatographic run with nc sample injection). The blank chromatogram is 
45 subtracted from the chromatogram of interest before peaK integration so as to obtain a baseline corrected 
chromatogram. In either approach, the trapezoid subtraction or the blank chromatogram subtraction, the 
determination of peak parameters is dependent upon the accuracy of the baseline correction. 

As previously cesenbed, peak 10 of Fig. 2 is a baseline corrected representation of peak 10 of Figs. 
1A and IB. A first and a second baseline reference points b. e were determined and baseline corrected 
w peak 10 was defined as the peak above the s'raight line between first and second baseline reference points 
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55 tne x axis n Fig 2. of Dean 10. In one measurement of asymmetry, i.e.. peak skewness. the asymmetry is 
aeterminec by drawing a ime horizontal to the x-axis at 10% of vertical height h. The distance A from 
vertical line h to the left edge of the oeak is the leading peak half width and distance B from vertical line h 



EP 0 395 481 A2 

peak half widths at different fractions cf vertical height h. 

If a peak is gaussian in shape, distance A, the leading peak half width, equals distance B. the trailing 
peak half width. If a peak is gaussian. the column efficiency in te r ms of ths number o: theoretical oiaies N is 
easily determined. Tne general definition of column efficiency in units of tneoretical piates is 

T 2 

N = -j- (1) 
o 

10 

where T r is the retention time for the peak and cr is the variance or the second central moment of the oeak 
measured in time units. For a gaussian curve, 

tb 2 A 2 

2* h 

wtere A is the area of the gaussian peak and h is the peak height. Substituting Equation 2 into Equation 1 
^ gives 

2* (hT ) 2 

N = . (3) 

x A 



35 



45 



Therefore, column efficiency as measured by theoretical plates can be determined from tne vertical peak 
height h. the retention time and the area of a gaussian curve. 

Non-Gaussian peaks (nonsymmetric peaks) in which distance A (Fig. 2) is not equal to distance B are 
typically encountered in chromatographic measurements, as discussed more completely below. However, 
the use of an equation such as equation (3), which is based upon a symmetric peaK. to determine 
theoretical plate numbers for nonsymmetric peaks can result m serious errors. 

Several researchers have used an exponentially modified gaussian (EMG) model for quantitation of 
chromatographic peaks. The development, characterization and theoretical and experimental justification of 
the exponentially modified gaussian (EMG) model has been discussed in several different references. See 
for example. R. E. Pauls and L B. Rogers. Anal. Chem 49. 628 1977. or E. Grushka et al. Anal. Chem. 41. 
889-892. 1969. 

The exponentially modified gaussian function G(t) is a convolution of a gaussian function and an 
exponential decay function and is expressed as: 

2 

A fi( £ ) 2 -f E)l r z e " 2 

G(t) = - e l * V t ;) J dy (4) 

1 -» /2* 



where 

t-t 



ir ncuatior A is an ampuiuoe wmc 1 corresponds ;o :nc Dtc". re j;'.; , c »~ trie: lime ^* mc^.r-.^r:. ampmujt 
' 5 of the Gaussian ^unction, c is a standard deviation of tne gaussian function and t is a time constant of tne 
exoonentia decay function. The ratio r a is a measure of peak asymmetry. As tc increases, the 

crromatooraohic peak becomes more tailed Conversely, as r o aooroaches zero, the chromatographic peak 

:■! a:^r-- .. c v icaK v EMC \;-c*^ r ~ : ■ ■ - * ^ r- ^ ? - 1 d e rr"- ':a . • ^ ?r oeaKS am 
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© Method and apparatus for estimation of parameters describing chromatographic peaks. 



© A two-step process to- characterization of peaks in a chromatogram is disclosed. In a first step, data 
corresponding to each peak or each pair of peaks in the chromatogram is identified. A unique filter apparatus 
loca:es extrema of the curvature of the chromatographic data and a data file is generated containing characteris- 
tics of the extrema. A pattern recognition apparatus analyzes the characteristics of the locatec extrema and 
Classifies the peak or peak combination represented by the data in the file as one peak or peak combination in a 
set of resolved peaks and selected combinations of resolved peaks. A portion of tne chromatographic data, 
which corresponds to the peak or peak combination identified by the pattern recognition apparatus, is identified. 
This portion of the data includes both the signal tor sairj peak and the signal for the baseline uoon which the 
peak is superimposed. 

In the second step, data for a peak or a peak combination identified as descnbed above, cr in the 
alternative, laentified by some other process, is processed ana a set of characterizing parameters for the peak or 
the first peak in the peak combination is generated without a pnor base'ine correction to the data. The peak data 
including the baseline level upon which the peaK is superimposed is analyzed using one of lookup tables, neural 
nets, curve fitting, or combinations of lookup taDies, neural nets and curve fitting. Each of these characterization 
processes, using information about the peak crest and the peak inflection points, determines a set of 
characterizing parameters and a baseline estimate that best fit the identified data. Thus, the peak characteriza- 
tion according to the principles of this invention is not biased by a prior baseline correction. 
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METHOD AND APPARATUS FOR ESTIMATION OF PARAMETERS DESCRIBING CHROMATOGRAPHIC 

PEAKS 



FIELD OF THE INVENTION 

This invention relates generally to a method and apparatus for quantitatively defining gas and liquid 
5 chromatography data and more specifically to a method and apparatus for quantifying the signals from a 
detector used in gas and liquid chromatography so as to determine retention nmes. areas, widtns. heights 
and SKewness of measured chromatograDhic peaks. 



K BACKGROUND OF THE INVENTION 

Gas and liquid chromatography are widely used analytic techniques for separation and quantitation of 
mixtures of chemical compounds. In chromatographic analyses, a small sample of a mixture is introduced 
initially at the top of a column coated or packed with an adsorbent. "Top" as used herein is a relative term 

is and means the end or region of a chromatographic column where the sample is initially introduced to the 
column. The adsorbent reversibiy adsorbs components of the mixture. Thus, initially the sample is bound to 
the adsorbent at the top of the column. A carrier gas or liquid, referred to as an eluent, is passed through 
the column. As the eluent passes through the column, the components of the sample are aisplaced from 
the adsorbent by the eluent and then the components are adsorbed at another point on tne column. 

20 The various components of the mixture migrate through the column at different rates. The rate of 

migration of each component depends uoon the affinity of the adsorbent for the component and the ability 
of the eluent to displace the component from the adsorbent as well as other factors known to those skilled 
in the art. Accordingly, different components of the sample migrate cown the coiumn at different speeas. 
Thus, as the earner gas or liquid emerges from the column, the components in the mixture are swept out of 

25 the column with the carrier gas or liquid at various time intervals, i.e. retention times, after the introduction 
of the sample at the top of the column. 

To measure the retention times of the various components in the sample, a detector is placed at about 
the exit of the column so that the eluate emerging from the column passes through the detector. Typically 
in liquid chromatography, the detector employs either ultraviolet adsorbance. refractive index or florescence 

30 as the measurement means. In gas chromatography, flame ionization or thermal conductivity are frequently 
employed as the oetection principle. 

Independent of the detecto r used for tne chromatographic measurement, the detector generates an 
electrical signal which changes as a function of time in response to the concentration of the components of 
the sample passing through the detector. Fig. 1A illustrates the features of a typical chromatGgram. The 

35 vertical line at the left hand side of Fig. 1A represents the introduction of the sample at the top of the 
chromatograpnic column and the initiation of eluent flow through the column. A firs: resolved peak 10 
reaches a maximum at a time T1 while a second resolved peak 20 reaches a maximum at a time T2 Time 
T1 is the retention time for peak 10 while time T2 is the retention time for peak 20 Peaks 30, 40 are fused 
peaks. 

-'0 A baseline signal level, which is usually defined as the signal level associated with only eluent flow 

through a chromatographic column is represented by tne signal level 50- prior to peak 10, the signal level 
50? between peak 10 and peak 20. and so forth. The signal level for peak 10 consists of a signal level 
associated with a component of the sample passing through the detector and a baseline signal level 
associated with eluent flow through the detector As described more completely below, a central problem in 

js interpretation of a chromatogram is determining the contribution of the baseline signal level to the measured 
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specific components of tne sample as well as performance inocatcrs on tne operation of tne chromatog- 
raphic column. Information typically gene-atea for each pea«. as described more completely beiow. 
includes peak characteristics such as retention time. Deak area. peaK height, peak width and skewness. 

Tr ascprtai~ the characteristics 'esol'/ec cn^^,a:ccrao M ■-: peaks - 0 20 as illustrates ^ F:c i£ 
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